E-Scribe : a programmer’s blog

About Me

PBX I'm Paul Bissex. I build web applications using open source software, especially Django. Backstory: In the 1990s I did graphic design for newspapers and magazines. Then I wrote technology commentary and reviews for Wired, Salon.com, Chicago Tribune, and lots of little places you've never heard of. Then I taught photographers how to create good websites. I co-wrote a book (see below) along the way. Current story: I am helping turn a giant media corporation into a digital enterprise. Feel free to email me.

Book

I'm co-author of "Python Web Development with Django", an excellent guide to my favorite web framework. Published by Addison-Wesley, it is available from Amazon and your favorite technical bookstore as well.

Colophon

Built using Django, served by Apache and mod_wsgi. The database is SQLite. The operating system is FreeBSD, on a VPS hosted at Johncompanies.com. Comment-spam protection by Akismet. Vintage topo imagery from the Maptech archive. The markup engine is Markdown.

Pile o'Tags

Stuff I Use

bitbucket, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, Markdown, Mercurial, OS X, Python, Review Board, S3, SQLite, Sublime Text, Ubuntu Linux

Spam Report

At least 226621 pieces of comment spam killed since 2008, mostly via Akismet.

Managing a Django project using darcs

Preamble

This article is two things:

  1. A description of one way to use version control with a Django project
  2. An introduction to using the darcs distributed version control system in particular

First, though, a mini-sermon which someday will be a post in the You Really Should series: You really should use version control. Most of you probably do. But if you're among those not using version control to manage your software projects, start now! Learn a good version control system and start using it on just one project. You'll work so much more productively and confidently that you'll want to use version control everywhere. If you're just getting started, learn Subversion -- it is more or less the standard at this point, and employers and other programmers will assume you know it.

Distributed version control

So, Subversion's great. I use it daily. However, like CVS before it, Subversion is a centralized system. All working copies of a project managed by Subversion are dependent on a central repository. This is often fine, but there's an alternative model worth knowing about: distributed version control.

Distributed version control systems have no concept of a central server; instead, each copy of a project managed with a DVCS is a fully functional repository, yet changes can still be shared between checkouts or pushed to an agreed-upon "master" repository.

There are many arguments made for DVCS, but the thing I appreciate most is the simplicity. With a DVCS, versioning information is stored right there with your other files. Repositories (really just directories with versioning info in a special subdirectory) can be copied, moved, and generally shipped around just like any other files. Turning a working directory into a managed repository is almost instantaneous. You don't have to have a server set up, you don't have to remember a separate repository path, you don't have to decide how to arrange your various projects in your master repository.

The DVCS I've been most taken with is darcs. It's simple to get started with, and uses a unique patch-oriented approach which I find very natural. The rest of this post is about using darcs in the context of working on a Django web application. Most of what I say below applies to any version control system -- it's just simpler (in my opinion) with darcs.

Getting darcs

Installing darcs is potentially daunting, because it's written in Haskell and thus requires a Haskell compiler if you're building it from source. So, unless you've decided for some crazy reason to learn Haskell too, look on the darcs website for a binary for your system.

I run FreeBSD on my servers, and though I generally build software using the ports system, the darcs port has had some trouble recently; on one server I simply installed the Linux static binary and thanks to Linux binary compatibility mode that works great.

Project layout

In this example I'm going to assume that the live version of your Django project already exists. This models a real-world scenario where you've got a functioning Django site but you realize You Really Should (tm) be using version control on it.

The goal here is to have two separate instances of the project, called testing and live. The live version is the one the public sees. The only updates to live will be pushed from the testing instance (after, you know, testing them).

The directory structure in this example puts testing instances in /django/testing/ and live instances in /django/live/.

This is in fact very similar to the arrangement I use on my server. I add /django/testing/ to my $PYTHONPATH so that when running the development server from the shell it uses the testing checkout. The live site, running under mod_python, has a PythonPath directive in its Apache configuration file that points to /django/live/.

Turning the live project into a repository

This step highlights another DVCS advantage: you don't have to "import" the current code into the repo, check it back out into a new directory, and replace the live code with the checkout. You simply turn the live project into a repository. How do you do that, you ask?

Let's say our project is called "mysite".

> cd /django/live/mysite
> ls -l
total 22
-rw-r--r--  1 pbx  wheel      0 Mar 11 15:55 __init__.py
-rw-r--r--  1 pbx  wheel    139 Mar 11 15:55 __init__.pyc
drwxr-xr-x  2 pbx  wheel    512 Mar 11 15:58 docroot
drwxr-xr-x  2 pbx  wheel    512 Mar 11 15:56 logs
-rwxr-xr-x  1 pbx  wheel    546 Mar 11 15:55 manage.py
drwxr-xr-x  2 pbx  wheel    512 Mar 11 16:09 myapp
-rw-rw-rw-  1 pbx  wheel  68608 Mar 11 16:01 mysite.db
-rw-r--r--  1 pbx  wheel   2346 Mar 11 15:57 settings.py
-rw-r--r--  1 pbx  wheel   1738 Mar 11 16:01 settings.pyc
-rw-r--r--  1 pbx  wheel    293 Mar 11 16:09 urls.py
-rw-r--r--  1 pbx  wheel    316 Mar 11 16:09 urls.pyc

This is a fairly minimal, standard Django project. I have one app ("myapp") and a SQLite database. There's also a "docroot" folder for static files, and a "logs" directory for Apache logs.

Let's make it into a repository:

> darcs init

This command creates a directory called "_darcs" inside the current directory. This is where revision history will be stored, in the form of a "pristine" copy of your source plus a collection of patches. Darcs commands can be abbreviated to any unique prefix -- darcs init is short for darcs initialize.

The _darcs directory also contains preferences related to the repository. One of these is a file called "boring", a list of regular expressions matching file names that should not be versioned. The default is quite extensive; for example, it covers .pyc and .pyo files, common backup files, and directories used by other version control systems. My additions tell it to also ignore the "logs" directory and the database, neither of which I want under version control. (Handling test vs. live databases is a much larger subject, but this works for the example here.)

> cat >> _darcs/prefs/boring 
^logs/
^mysite.db$
>

With that settled, we can add the non-boring files to the repository:

> darcs add -r *
Skipping boring file __init__.pyc
...

(Full list of boring files omitted. They're boring.)

Now let's actually record our changes, using the darcs record command. This is analogous to the "commit" operation in a centralized version control system like Subversion or CVS.

This one looks a little long at first glance. There's a one-time query for your email address when you first set up a new repository (you can take care of this for all repositories with a ~/.darcs/author file).

> darcs record
Darcs needs to know what name (conventionally an email address) to use as the
patch author, e.g. 'Fred Bloggs <fred@bloggs.invalid>'.  If you provide one
now it will be stored in the file '_darcs/prefs/author' and used as a default
in the future.  To change your preferred author address, simply delete or edit
this file.

What is your email address? pb@e-scribe.com
adddir ./docroot
Shall I record this change? (1/?)  [ynWsfqadjkc], or ? for help: a
What is the patch name? Initial record
Do you want to add a long comment? [yn]n
Finished recording patch 'Initial record'

I could have done this as a one-liner (darcs rec -am "Initial record") but the longer form shows one of the nice things about darcs: it's perfectly happy to quiz you about what exactly you want to do. This comes in very handy in certain situations, e.g. when you want to record only certain changes, or when you want to send only certain patches to a remote repository.

To review, here's the condensed version (minus the email and _darcs/prefs/boring parts):

> cd /django/live/mysite
> darcs init
> darcs add -r * 
> darcs rec -am "Initial import"

OK, so my live site is still chugging away, but now it's also a repository. Now we can make a branch for testing.

Setting up the test instance

First I cd to my testing directory. To make a branch of the live project, I could use darcs get. However, since the repository is self-contained, it's actually as easy in this case to do a simple copy:

> cd /django/testing
> cp -r /django/live/mysite .

OK, we now have an identical copy of our project. Let's run the development server to test it out.

> ./manage.py runserver
Validating models...
0 errors found.

Django version 0.96-pre, using settings 'mysite.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
[11/Mar/2007 19:41:58] "GET / HTTP/1.1" 200 1138

That last line is a log entry showing my successful request for my site's home page. Sweet!

Making changes

Now it's time to start developing. Let's make a very simple change to try things out. I never customized the time zone in settings.py, so it's still set to the default of "America/Chicago". Let's fix that. I edit the file and change "America/Chicago" to "America/New_York". (I'm not in New York, but I'm in its time zone.)

Now let's have darcs show us the change.

> darcs whatsnew
{
hunk ./settings.py 17
-TIME_ZONE = 'America/Chicago'
+TIME_ZONE = 'America/New_York'
}

This is darcs' native patch format. In most terminals, it will be displayed with nice ANSI coloring. Your _darcs/patches directory is full of files that look like this (gzipped by default, to save space).

If you need it to darcs can also show you a plain old diff:

> darcs diff
diff -rN old-mysite/settings.py new-mysite/settings.py
17c17
< TIME_ZONE = 'America/Chicago'
---
> TIME_ZONE = 'America/New_York'

Or a unified diff:

> darcs diff -u
diff -rN -u old-mysite/settings.py new-mysite/settings.py
--- old-mysite/settings.py      Mon Mar 12 06:10:18 2007
+++ new-mysite/settings.py      Mon Mar 12 06:10:18 2007
@@ -14,7 +14,7 @@

 # Local time zone for this installation. All choices can be found here:
 # http://www.postgresql.org/docs/current/static/datetime-keywords.html#DATETIME-TIMEZONE-SET-TABLE
-TIME_ZONE = 'America/Chicago'
+TIME_ZONE = 'America/New_York'

 # Language code for this installation. All choices can be found here:
 # http://www.w3.org/TR/REC-html40/struct/dirlang.html#langcodes

OK, enough playing around, let's test the change.

> ./manage.py runserver
...
[11/Mar/2007 20:43:14] "GET / HTTP/1.1" 200 1138

Good -- notice the logged time is now an hour later. That's what we wanted to see.

In practice you'll probably want to keep the development server running while you work, rather than running and stopping it manually. I have an older post on using GNU screen for that.

Now that we're happy with the result, we can record the change.

> darcs rec
hunk ./settings.py 17
-TIME_ZONE = 'America/Chicago'
+TIME_ZONE = 'America/New_York'
Shall I record this change? (1/?)  [ynWsfqadjkc], or ? for help: y
What is the patch name? fixed time zone
Do you want to add a long comment? [yn]n
Finished recording patch 'fixed time zone'

Going live

Now the fun part -- pushing the patch to the live site.

> darcs push /django/live/mysite/

Mon Mar 12 06:18:08 PDT 2007  Paul Bissex <pb@e-scribe.com>
  * fixed time zone
Shall I push this patch? (1/1)  [ynWvpxqadjk], or ? for help: y
Finished applying...

Here's another place where darcs' interactivity shines. This stage lets me confirm that I really want to push a particular change out to the live site. If I had recorded three different sets of changes -- three patches -- I could decide to apply any or all or none of them here. This is known as "cherry-picking".

It's only necessary to specify the target of the push the first time. After that, it becomes the default, and a simple darcs push will do the right thing.

Since my live site is Apache + mod_python, I need to restart the Apache server to make sure the changes to the code take effect:

> sudo apachectl graceful

That's it! To streamline, you could write a script that combined the darcs push and server-restart steps, perhaps even running your test suite before the push.

Summary

Here's the cheat-sheet version of this tutorial. To set things up, do these steps:

> cd /django/live/mysite
> darcs init
> darcs add -r * 
> darcs rec -am "Initial import"
> cd /django/testing
> cp -r /django/live/mysite .
> cd mysite

To work on your site, repeat these steps:

... make changes, test using development server ...
> darcs rec -am "Name for my patch"
> darcs push
... restart live web server (if needed) ...

Of course, you will often make many patches before pushing changes out to the live server.

Now imagine that instead of a trivial change like the time zone edit, you were doing something more complex -- adding a new view, or refactoring some existing code. If you were to make a mistake, even completely breaking things, the breakage would only be visible this private testing branch, not the live site.

Even better, imagine making a change that you decided you wanted to remove only after deploying it to the live site. With darcs rollback you could undo the effects of the offending patch, and re-push to the live site.

And with darcs' cherry-picking abilities, you can also keep specific patches out of the live site even after you've recorded them on the testing side.

I hope this has been clear and helpful. Please post comments letting me if you found it useful or if there are points that need to be clarified. And have fun!

(If you want to print the article out, you may prefer the plain HTML version.)

Monday, March 12th, 2007
+ + + +
13 comments

Comment from Noah Aboussafy , later that day

Why did you chose darcs over over bazaar or another DVCS ?

Perhaps it would be a good idea to mention that with a DVCS you're kind of pootched if your hard drive fails unless you've pushed it to a server.

Also with a centralized repo you can check out new copies on machines you've manged etc. I think DVCS has a long way to come before I use it.

Comment from Paul , later that day

Noah:

I looked at both Bazaar and Mercurial, partly because I have a bias toward all things Python. Of course by saying anything critical here I invite a pointless and boring war-o-the-DVCSs thread, but: in the end I decided against Bazaar because it felt a bit too complex and a bit too unfinished, and against Mercurial because of one too many references to renaming-related problems (the very class of problem that drove me from CVS to svn). I haven't written either one off permanently, though.

Your other points don't make much sense to me. Whether a repo is local or remote has nothing to do with the type of version control used. You can have a svn repo and checkout living on the same machine; you can have a darcs repo and checkout (branch) on separate machines. Whether you're "pootched" depends entirely on how you manage and back up your data, not on your choice of version control software.

Comment from Max Battcher , later that day

Nice article. The only thing that I would point out is that on a single system darcs get is slightly more efficient than cp -r because it will (if it can) link/symlink the patch files instead of duplicating them. Not a huge difference for a new repository, but something more useful for repositories with a large history. (Down the road you can ask darcs to relink patches between local repositories with the darcs optimize command.)

Comment from limodou , later that day

Good article! I think it's useful!

Comment from Chui Tey , later that day

Unfortunately, getting file histories out of darcs is not supported. This really takes the versioning out of version control.

Comment from Šime , later that day

Excellent choice, darcs is amazing, it's also my choice over bazaar and mercurial. Nice article, thanks.

Comment from Paul , later that day

Thanks for all the comments!

Max -- You're absolutely right. In this case I chose cp mostly to simplify the tutorial -- I would have had to re-do the boring-file and email steps because darcs get doesn't copy prefs. Also, cp preserves the exec bits on manage.py.

Chui -- I do sometimes wish for rev numbers. In practice, the patch-based system is working well for me. Tagging helps too.

Comment from Lorenzo Bolognini , 1 day later

I honestly fail to see the advantage of the "theory of patches" and how this is better than SVN revisions.

You can still run your web app off an SVN working copy, you can put some settings to tell it what db strings to use if the machine name == "Production" or something like that and you can have svn hooks run the test suite after every commit.

Comment from Paul , 2 days later

Lorenzo -- This is not "darcs vs. svn"! Yes, you can do this stuff using Subversion -- and I do. I also have been experimenting with using darcs for the same task, because of some of the conveniences I mention in the article -- for example, the ability to cherry-pick patches at push-time.

I feel it's incumbent on me as a developer to know at least one centralized and one decentralized SCM system, and Subversion and darcs are my respective picks. No showdown intended. Use what works for you.

Comment from s , 1 week later

Darcs theory of patches allows you to merge patch A, then B, then take out A and it will adjust the line numbers to suit.

But, doesn't that risk leaving you with code that doesn't run (e.g. patch A renames a variable that patch B uses)?

Does that effect render cherry-picking somewhat useless in practise?

Comment from frits , 1 week later

Thanks for the tutorial. One question: shouldn't 'darcs push /django/projects/mysite/' be 'darcs push /django/live/mysite/'?

Comment from Paul , 1 week later

Good catch! Fixed.

Re the question from s above: Honestly, I haven't used darcs long enough or on enough different things to have encountered that problem at all. You're right that it's possible. I'd file it under "with great power comes great responsibility" -- patches that modify a discrete piece of functionality are certainly better candidates for cherry-picking than "cleaned up a bunch of stuff" kind of patches.

For what it's worth, darcs also has a special token-renaming type of patch that might be a good fit for the scenario you describe. Haven't used it myself though.

Comment from Neebone , 5 weeks later

Regarding the cherry-picking, Darcs allows you to create dependencies among patches - so if you applied patches A to D with B depending on C, if you remove C, B should also be pulled out.

In any event, you may have patches that are unrelated, separate functionality etc, and cherry-picking would allow you to remove any of those patches if you wished.

Comments are closed for this post.