E-Scribe : a programmer’s blog

About Me

PBX I'm Paul Bissex. I build web applications using open source software, especially Django. Started my career doing graphic design for newspapers and magazines in the '90s. Then wrote tech commentary and reviews for Wired, Salon, Chicago Tribune, and others you never heard of. Then I built operations software at a photography school. Then I helped big media serve 40 million pages a day. Then I worked on a translation services API doing millions of dollars of business. Now I'm building the core platform of a global startup accelerator. Feel free to email me.

Book

I co-wrote "Python Web Development with Django". It was the first book to cover the long-awaited Django 1.0. Published by Addison-Wesley and still in print!

Colophon

Built using Django, served with gunicorn and nginx. The database is SQLite. Hosted on a FreeBSD VPS at Johncompanies.com. Comment-spam protection by Akismet.

Elsewhere

Pile o'Tags

Stuff I Use

Bitbucket, Debian Linux, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, macOS, Markdown, Mercurial, Python, S3, SQLite, Sublime Text, xmonad

Spam Report

At least 236559 pieces of comment spam killed since 2008, mostly via Akismet.

A Mercurial mirror of Django's Subversion repository

Update: The mirror described in this post has been retired. Django source now lives on GitHub.

Just wanted to post a quick note that I'm now publishing an experimental Mercurial mirror of the Django source code repository, including all tags and branches and even the djangoproject.com website source itself. Tom Tobin at The Onion has been maintaining a similar mirror of Django trunk for a while (and very helpfully answered some of my questions in IRC), but I wanted to do the whole tree.

It's an experiment, which is to say it might go away at any time, so be warned! And let me know if you think it's useful. Components in the chain include svnsync, hgwebdir, Apache mod_cache, and even Pygments for source code colorizing. It's updated once per hour.

My earlier experiments along these lines used hgsvn, which is cool because it tags each commit with the corresponding svn rev number. Unfortunately, hgsvn basically ground to a halt while trying to digest all 7000 revs, so I switched to Mercurial's built-in "convert" command.

Mercurial doesn't yet support partial-tree cloning, so if you want your own copy you're going to be fetching the whole thing! It takes up about 350MB, which isn't bad considering it includes all 7000+ changesets.

Have fun!

Wednesday, February 6th, 2008
+ + + + + + +
10 comments

Comment from masklinn , later that day

That's really cool. I was using git-svn to track Django, but I may switch to that instead.

Comment from Thomas Capricelli , later that day

Hi Paul, thanks for this mirror. I was actually looking for one, and I was not aware for the other one you mention. Although, i find it weird to have all put on the same repository. As you say, it's not possible to checkout a partial tree with mercurial, and it seems natural to do several smaller directories. Can't your 'svn import script' do that ? Svn can do partial checkout/update.

Comment from Nicholas Riley , later that day

I do this with hgsvn and svnsync (hg convert failed on me, last time I tried), but having multiple trees means you can't track branch merges in the same repo. I now regret doing it.

Comment from Thomas Capricelli , later that day

Of course you want to keep tags and branches, but you can still have a repository for the website and another one for the source. I successfully imported svn repositories and kept branches and tags. It was not easy though. I've tried lot of different ways to import svn to mercurial, and the one included in mercurial was the best one. (although i had to use the mercurial repository, it was not released yet)

Comment from Paul , later that day

Hi Thomas, the challenge is: what's the right point at which to break out separate repos? Going one level deeper than I have gone (i.e. separate repos for django and djangoproject.com) makes sense, but it doesn't really make the main repo much smaller, and it's still got every branch and tag ever made. Or the next level down (separate repos for branches/tags/trunk)? That would be much more satisfying to people who just want to track trunk especially, but as Nicholas points out it can break merge tracking (though I admit I haven't thought that part through).

As far as my "svn import script", there's nothing custom. I maintain a local mirror of the main Django repo using svnsync, and update the Mercurial repo via hg convert in a cronjob.

Comment from Horst Gutmann , later that day

Great, thank you :D

Just a question: How do you deploy hgwebdir? CGI, FastCGI, WSGI? Just curious.

Comment from Paul , later that day

Horst: It's plain CGI with Apache mod_cache in front. The caching is crucial, especially with the addition of Pygments rendering for every source file.

Re the earlier questions about structure, I'm playing with other arrangements, so you may see individual branch repos start showing up as well.

Comment from Thomas Capricelli , 1 day later

Hi Paul. I see you're experimenting. There's currently one repository for one branch.

I stay with the impression that it would be better to have one repository for the code and another one for the website. It really makes sense to have all branches in the same repository, and mercurial is quite optimized to handle big projects. I'm using the kernel mercurial repository and I never had any problem. Where do you think the problem would be ? bandwidth for your server ? for the user ? disk storage for the user ?

Comment from Paul , 1 day later

My main concerns are usability, maintainability (for me), and potential bandwidth consumption. Most people really only want one or two branches, which are much much smaller when broken out separately.

I'm now looking at a third option, a single repo with selected branches (Mercurial "named branches") via hgsvn.

Comment from Tane , 6 weeks later

As a Django and Mercurial user, you might be interested in having a look at http://hg.sharesource.org/hgfront. It's an application we're developing in Django at the moment to manage your local and remote repositories.

We're happy to get feedback and accept new ideas, and we're quite close to our first public release.

Comments are closed for this post. But I welcome questions/comments via email or Twitter.