E-Scribe : a programmer’s blog

About Me

PBX I'm Paul Bissex. I build web applications using open source software, especially Django. Started my career doing graphic design for newspapers and magazines in the '90s. Then wrote tech commentary and reviews for Wired, Salon, Chicago Tribune, and others you never heard of. Then I built operations software at a photography school. Then I helped big media serve 40 million pages a day. Then I worked on a translation services API doing millions of dollars of business. Now I'm building the core platform of a global startup accelerator. Feel free to email me.

Book

I co-wrote "Python Web Development with Django". It was the first book to cover the long-awaited Django 1.0. Published by Addison-Wesley and still in print!

Colophon

Built using Django, served with gunicorn and nginx. The database is SQLite. Hosted on a FreeBSD VPS at Johncompanies.com. Comment-spam protection by Akismet.

Elsewhere

Pile o'Tags

Stuff I Use

bitbucket, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, Markdown, Mercurial, OS X, Python, Review Board, S3, SQLite, Sublime Text, Ubuntu Linux

Spam Report

At least 236528 pieces of comment spam killed since 2008, mostly via Akismet.

Mercurial: good enough for now

Lately I've been trying out the Mercurial distributed version control system on some real projects.

I currently use Subversion for production stuff at work. It's reliable, has great Trac integration, and is most likely to be known by other developers. (In fact, we hired a new person at work this fall who will be helping me with web development, and it turned out that Subversion was what he was familiar with. So I feel vindicated on that last point especially.)

But I'm totally sold on the distributed model as the future of version control.

I started playing with Darcs earlier this year; it was my introduction to distributed version control, and I liked it a lot. In particular I liked its interactivity and the ability to easily cherry-pick on two levels: among individual file changes when adding patches (darcs record, analogous to svn commit), and among individual patches when pushing to a remote repository. In the end it was a fairly minor thing that sent me looking for an alternative: I manage a lot of websites, and most of those trees have symlinks. Darcs doesn't handle symlinks. (There's also the dreaded "exponential merge" problem, but that's another story.)

So I went looking at the alternatives, and Mercurial had the best combination of 1) low suckage and 2) apparent chance of long-term success.

I could almost as easily have gone for Bazaar (bzr). I like the fact that both are written in Python (though Mercurial adds some C for speed). I believe Python encourages well-written software, and I like the idea that if I were to find a bug I might actually be able to contribute a fix. I kind of wish Bazaar didn't exist, in fact -- the support from Canonical means it won't go away any time soon, but it's not yet a no-brainer win, so it only keeps the market diffuse.

Mercurial is spiffy. It really is fast. It has a good book. It has sensible, rememberable, easily abbreviated commands that will click pretty well for someone coming from a CVS/Subversion background.

Being able to commit offline is great. I'm working on an intranet app and have copies on both an internal server and my laptop. At work, I work on the internal server. When I leave, I pull changes to my laptop copy. In off hours I hack on the local copy, then when I get back I push the changes back out. Each side has a full history. This is all basic goodness you get with any DVCS, but Mercurial does it well and I'm happy.

Enough of the love; let's move on to some petty complaints.

Mercurial warts

Now, I know that developers and lovers of Mercurial have defenses for these, but regardless, here are some minor things I wish were different:

Update: I realized I should make a correction to that last one to account for the efforts of Bryan O'Sullivan who, between his Google tech talk and his book, has done a huge amount to make Mercurial's merits more widely known.

Last words

If you're on quest for a distributed version control system, I have one piece of advice, which is not to believe people who say that it doesn't matter what you choose because they all pretty much work the same way and any decent developer can learn how to use one in five minutes. I agree you can probably learn how to check out, edit, commit, check in, and view a log in five minutes or so. But when you get into merging, branch management, rolling back changes, and other real minutiae of day-to-day work you're going to be doing some actual learning about the mechanics of your particular system. I have a decent, but by no means masterful, grasp of Subversion, Darcs, and Mercurial (and CVS, but I'm trying to forget that); each took some study. If you think they're all the same then you're going to be very confused about stuff like "darcs push" applying patches to the remote working directory when "hg push" doesn't.

However, don't let that paralyze you. I'm just making an argument for valuing your own time and not underestimating what it takes to be productive. The other natural worry, lock-in, seems to be turning into almost a non-issue. The tools for getting stuff from one VCS to another have made great progress in the past couple years. Bazaar has good Subversion integration; Mercurial has hgsvn; and Tailor can bridge almost any gap.

Update: There's also some interesting discussion in the reddit thread for this post.

Saturday, September 8th, 2007
+ + + +
7 comments

Comment from Bryan O'Sullivan , later that day

Nice article. I have a few comments on your observations of "warts", though.

  • Directory handling. Yes, Mercurial can't store empty directories. But putting a hidden file in an empty directory and adding it takes about 5 seconds longer than "svn add emptydir". This choice massively simplifies the implementation, which makes the software more reliable, and does so at almost no practical cost. Yes, it's a tradeoff, but a good one.

  • Directory renaming. Mercurial actually does a fantastic job on this: it gets all the same cases correct as systems that explicitly handle files and directories as persistent first-class objects, but it's even more flexible. You can not only rename files and directories; you can copy them too. When you copy a file and merge with someone else, their changes will show up in both the original and copied files, something that doesn't happen with other tools. (Subversion completely loses changes when you merge renamed files. I don't know if 1.5 will fix this, but it's a really nasty corner case: the insidious bug that bites you 0.5% of the time.)

  • .hgfoo files. This seems more like a matter of taste than anything else.

  • Adding a default .hgignore file would be cute, it's true, but ignore files actually affect performance: you don't want to ignore a huge pile of patterns that will never show up in your repository, because that will slow down file name matching.

  • Marketing. It speaks volumes to me that some big, serious projects have chosen Mercurial in spite of a relative lack of marketing. The software markets itself.

Comment from Paul , later that day

Thanks for the pointer to the record extension -- I look forward to seeing it in the next Mercurial release.

And as I said above, thanks for the book too! It's been very helpful.

Comment from nirs , later that day

It is not clear why you choose Mercurial over Bazaar. Mercurial is faster, but if your repo is not Mozilla or Linux kernel size, it does not matter.

I would never use a system that make "tradeoffs" about my data, for example empty directories. It is just not acceptable. A vcs should store exactly the data I give it.

Comment from Paul , later that day

Speed certainly isn't my primary concern, it's just an enjoyable bonus. My comment about hg being fast was meant in reference to my current svn setup, not bzr.

Like I said, I could have ended up choosing bzr, but for now I'm going with hg. I love learning cool new tools but I only have so much time for that. I'll likely revisit things when both have reached 1.0.

Comment from Jonathan Ellis , later that day

"Mercurial is faster, but if your repo is not Mozilla or Linux kernel size, it does not matter."

I disagree. The usability of features like grep and bisect is signficantly impacted by speed even on smaller repositories.

"I would never use a system that make “tradeoffs” about my data, for example empty directories. It is just not acceptable."

Now you're just being silly. Any project with non-infinite development resources (that means all of them :) makes tradeoffs. Bazaar is no exception. (Incidently, git also does not store empty directories, so with the two leading dscms making the same design decision, I don't think it's as unacceptable as you are trying to argue.)

Comment from nirs , later that day

You should not trade correctness for speed.

Comment from Flandry , 7 weeks later

Thanks for this blog. I have been looking for a revision control system and had assumed SVN was the way to go, but after googlebumping into a discussion of Hg i tracked it down. Your assessment of it and preference to bzr was the final nail for me: i'm going to try Hg. I've observed enough chinks in the Canonical software bulldozer in the context of Ubuntu to feel better trusting software not driven by their "sponsorship".

cheers

Comments are closed for this post. But I welcome questions/comments via email or Twitter.