A Mercurial mirror of Django's Subversion repository

Update: As of 2012, the primary Django repo is on GitHub. The mirror described in this post has been retired.

Just wanted to post a quick note that I’m now publishing an experimental Mercurial mirror of the Django source code repository, including all tags and branches and even the djangoproject.com website source itself. Tom Tobin at The Onion has been maintaining a similar mirror of Django trunk for a while (and very helpfully answered some of my questions in IRC), but I wanted to do the whole tree.

It’s an experiment, which is to say it might go away at any time, so be warned! And let me know if you think it’s useful. Components in the chain include svnsync, hgwebdir, Apache mod_cache, and even Pygments for source code colorizing. It’s updated once per hour.

My earlier experiments along these lines used hgsvn, which is cool because it tags each commit with the corresponding svn rev number. Unfortunately, hgsvn basically ground to a halt while trying to digest all 7000 revs, so I switched to Mercurial’s built-in “convert” command.

Mercurial doesn’t yet support partial-tree cloning, so if you want your own copy you’re going to be fetching the whole thing! It takes up about 350MB, which isn’t bad considering it includes all 7000+ changesets.

Have fun!

masklinn commented :

That’s really cool. I was using git-svn to track Django, but I may switch to that instead.

Thomas Capricelli commented :

Hi Paul, thanks for this mirror. I was actually looking for one, and I was not aware for the other one you mention. Although, i find it weird to have all put on the same repository. As you say, it’s not possible to checkout a partial tree with mercurial, and it seems natural to do several smaller directories. Can’t your ‘svn import script’ do that ? Svn can do partial checkout/update.

Nicholas Riley commented :

I do this with hgsvn and svnsync (hg convert failed on me, last time I tried), but having multiple trees means you can’t track branch merges in the same repo. I now regret doing it.

Thomas Capricelli commented :

Of course you want to keep tags and branches, but you can still have a repository for the website and another one for the source. I successfully imported svn repositories and kept branches and tags. It was not easy though. I’ve tried lot of different ways to import svn to mercurial, and the one included in mercurial was the best one. (although i had to use the mercurial repository, it was not released yet)

Paul commented :

Hi Thomas, the challenge is: what’s the right point at which to break out separate repos? Going one level deeper than I have gone (i.e. separate repos for django and djangoproject.com) makes sense, but it doesn’t really make the main repo much smaller, and it’s still got every branch and tag ever made. Or the next level down (separate repos for branches/tags/trunk)? That would be much more satisfying to people who just want to track trunk especially, but as Nicholas points out it can break merge tracking (though I admit I haven’t thought that part through).

As far as my “svn import script”, there’s nothing custom. I maintain a local mirror of the main Django repo using svnsync, and update the Mercurial repo via hg convert in a cronjob.

Horst Gutmann commented :

Great, thank you :D

Just a question: How do you deploy hgwebdir? CGI, FastCGI, WSGI? Just curious.

Paul commented :

Horst: It’s plain CGI with Apache mod_cache in front. The caching is crucial, especially with the addition of Pygments rendering for every source file.

Re the earlier questions about structure, I’m playing with other arrangements, so you may see individual branch repos start showing up as well.

Thomas Capricelli commented :

Hi Paul. I see you’re experimenting. There’s currently one repository for one branch.

I stay with the impression that it would be better to have one repository for the code and another one for the website. It really makes sense to have all branches in the same repository, and mercurial is quite optimized to handle big projects. I’m using the kernel mercurial repository and I never had any problem. Where do you think the problem would be ? bandwidth for your server ? for the user ? disk storage for the user ?

Paul commented :

My main concerns are usability, maintainability (for me), and potential bandwidth consumption. Most people really only want one or two branches, which are much much smaller when broken out separately.

I’m now looking at a third option, a single repo with selected branches (Mercurial “named branches”) via hgsvn.

Tane commented :

As a Django and Mercurial user, you might be interested in having a look at http://hg.sharesource.org/hgfront. It’s an application we’re developing in Django at the moment to manage your local and remote repositories.

We’re happy to get feedback and accept new ideas, and we’re quite close to our first public release.