Mercurial: good enough for now

September 08, 2007

Lately I’ve been trying out the Mercurial distributed version control system on some real projects.

I currently use Subversion for production stuff at work. It’s reliable, has great Trac integration, and is most likely to be known by other developers. (In fact, we hired a new person at work this fall who will be helping me with web development, and it turned out that Subversion was what he was familiar with. So I feel vindicated on that last point especially.)

But I’m totally sold on the distributed model as the future of version control.

I started playing with Darcs earlier this year; it was my introduction to distributed version control, and I liked it a lot. In particular I liked its interactivity and the ability to easily cherry-pick on two levels: among individual file changes when adding patches (darcs record, analogous to svn commit), and among individual patches when pushing to a remote repository. In the end it was a fairly minor thing that sent me looking for an alternative: I manage a lot of websites, and most of those trees have symlinks. Darcs doesn’t handle symlinks. (There’s also the dreaded “exponential merge” problem, but that’s another story.)

So I went looking at the alternatives, and Mercurial had the best combination of 1) low suckage and 2) apparent chance of long-term success.

I could almost as easily have gone for Bazaar (bzr). I like the fact that both are written in Python (though Mercurial adds some C for speed). I believe Python encourages well-written software, and I like the idea that if I were to find a bug I might actually be able to contribute a fix. I kind of wish Bazaar didn’t exist, in fact – the support from Canonical means it won’t go away any time soon, but it’s not yet a no-brainer win, so it only keeps the market diffuse.

Mercurial is spiffy. It really is fast. It has a good book. It has sensible, rememberable, easily abbreviated commands that will click pretty well for someone coming from a CVS/Subversion background.

Being able to commit offline is great. I’m working on an intranet app and have copies on both an internal server and my laptop. At work, I work on the internal server. When I leave, I pull changes to my laptop copy. In off hours I hack on the local copy, then when I get back I push the changes back out. Each side has a full history. This is all basic goodness you get with any DVCS, but Mercurial does it well and I’m happy.

Enough of the love; let’s move on to some petty complaints.

Mercurial warts

Now, I know that developers and lovers of Mercurial have defenses for these, but regardless, here are some minor things I wish were different:

Directory handling Mercurial can’t store empty directories, and it has to do some clever tricks to handle directory renaming. The latter is not much of an issue in my experience, but I’d like directories to be first-class citizens in my version control system just like they are in my mental map of the tree. I’d like not to have .dummy files in empty directories.
.hgfoo files In a typical working directory I have a .hg directory and a .hgignore file and possibly a .hgtags too. I like that, unlike Subversion, Mercurial doesn’t litter my subdirectories, but in the interest of tidiness I’d like everything inside .hg. Nothing inside that directory is versioned; whether that could be changed without inelegance I don’t know. Or, perhaps, like Darcs, Mercurial could simply make storing these (unversioned) files inside .hg the default, and offer an optional mechanism for pointing to versioned files.
Default .hgignore Darcs has a lot of sensible default behaviors, and one is that it has a fairly comprehensive “boring” file listing patterns of files to be ignored. I’d love it if .hg init created my .hgignore in a similar way. How often will I really want to version *.pyc files, and *~ files, or .svn directories? Not often. Yes, you can do this on a per-user basis but for collaboration you need to have the information in the repo itself.
Marketing I get the sense that the very idea of “marketing” Mercurial is unappealing to much of its community. That’s too bad. You only can fully participate in a meritocracy if people know you exist and what you’re good at. One day when I was telling a developer about Mercurial he said that it sounded cool, but were there any graphical clients for it? I didn’t know, but I ended up starting the listing of Mercurial GUI clients and front-end tools on their wiki because the information was so hard to find.

Update: I realized I should make a correction to that last one to account for the efforts of Bryan O’Sullivan who, between his Google tech talk and his book, has done a huge amount to make Mercurial’s merits more widely known.

Last words

If you’re on quest for a distributed version control system, I have one piece of advice, which is not to believe people who say that it doesn’t matter what you choose because they all pretty much work the same way and any decent developer can learn how to use one in five minutes. I agree you can probably learn how to check out, edit, commit, check in, and view a log in five minutes or so. But when you get into merging, branch management, rolling back changes, and other real minutiae of day-to-day work you’re going to be doing some actual learning about the mechanics of your particular system. I have a decent, but by no means masterful, grasp of Subversion, Darcs, and Mercurial (and CVS, but I’m trying to forget that); each took some study. If you think they’re all the same then you’re going to be very confused about stuff like “darcs push” applying patches to the remote working directory when “hg push” doesn’t.

However, don’t let that paralyze you. I’m just making an argument for valuing your own time and not underestimating what it takes to be productive. The other natural worry, lock-in, seems to be turning into almost a non-issue. The tools for getting stuff from one VCS to another have made great progress in the past couple years. Bazaar has good Subversion integration; Mercurial has hgsvn; and Tailor can bridge almost any gap.

Update: There’s also some interesting discussion in the reddit thread for this post.

Bryan O’Sullivan commented on Sat Sep 8 23:15:21 2007:

Nice article. I have a few comments on your observations of “warts”, though.

Directory handling. Yes, Mercurial can’t store empty directories. But putting a hidden file in an empty directory and adding it takes about 5 seconds longer than “svn add emptydir”. This choice massively simplifies the implementation, which makes the software more reliable, and does so at almost no practical cost. Yes, it’s a tradeoff, but a good one.
Directory renaming. Mercurial actually does a fantastic job on this: it gets all the same cases correct as systems that explicitly handle files and directories as persistent first-class objects, but it’s even more flexible. You can not only rename files and directories; you can copy them too. When you copy a file and merge with someone else, their changes will show up in both the original and copied files, something that doesn’t happen with other tools. (Subversion completely loses changes when you merge renamed files. I don’t know if 1.5 will fix this, but it’s a really nasty corner case: the insidious bug that bites you 0.5% of the time.)
.hgfoo files. This seems more like a matter of taste than anything else.
Adding a default .hgignore file would be cute, it’s true, but ignore files actually affect performance: you don’t want to ignore a huge pile of patterns that will never show up in your repository, because that will slow down file name matching.
Marketing. It speaks volumes to me that some big, serious projects have chosen Mercurial in spite of a relative lack of marketing. The software markets itself.

Paul commented on Sun Sep 9 05:57:44 2007:

Thanks for the pointer to the record extension – I look forward to seeing it in the next Mercurial release.

And as I said above, thanks for the book too! It’s been very helpful.

nirs commented on Sun Sep 9 10:54:01 2007:

It is not clear why you choose Mercurial over Bazaar. Mercurial is faster, but if your repo is not Mozilla or Linux kernel size, it does not matter.

I would never use a system that make “tradeoffs” about my data, for example empty directories. It is just not acceptable. A vcs should store exactly the data I give it.

Paul commented on Sun Sep 9 11:36:31 2007:

Speed certainly isn’t my primary concern, it’s just an enjoyable bonus. My comment about hg being fast was meant in reference to my current svn setup, not bzr.

Like I said, I could have ended up choosing bzr, but for now I’m going with hg. I love learning cool new tools but I only have so much time for that. I’ll likely revisit things when both have reached 1.0.

Jonathan Ellis commented on Sun Sep 9 12:37:57 2007:

“Mercurial is faster, but if your repo is not Mozilla or Linux kernel size, it does not matter.”

I disagree. The usability of features like grep and bisect is signficantly impacted by speed even on smaller repositories.

“I would never use a system that make “tradeoffs” about my data, for example empty directories. It is just not acceptable.”

Now you’re just being silly. Any project with non-infinite development resources (that means all of them :) makes tradeoffs. Bazaar is no exception. (Incidently, git also does not store empty directories, so with the two leading dscms making the same design decision, I don’t think it’s as unacceptable as you are trying to argue.)

nirs commented on Sun Sep 9 14:49:17 2007:

You should not trade correctness for speed.

Flandry commented on Fri Nov 2 12:32:42 2007:

Thanks for this blog. I have been looking for a revision control system and had assumed SVN was the way to go, but after googlebumping into a discussion of Hg i tracked it down. Your assessment of it and preference to bzr was the final nail for me: i’m going to try Hg. I’ve observed enough chinks in the Canonical software bulldozer in the context of Ubuntu to feel better trusting software not driven by their “sponsorship”.

cheers