E-Scribe : a programmer’s blog

About Me

PBX I'm Paul Bissex. I build web applications using open source software, especially Django. Started my career doing graphic design for newspapers and magazines in the '90s. Then wrote tech commentary and reviews for Wired, Salon, Chicago Tribune, and others you never heard of. Then I built operations software at a photography school. Then I helped big media serve 40 million pages a day. Then I worked on a translation services API doing millions of dollars of business. Now I'm building the core platform of a global startup accelerator. Feel free to email me.

Book

I co-wrote "Python Web Development with Django". It was the first book to cover the long-awaited Django 1.0. Published by Addison-Wesley and still in print!

Colophon

Built using Django, served with gunicorn and nginx. The database is SQLite. Hosted on a FreeBSD VPS at Johncompanies.com. Comment-spam protection by Akismet.

Pile o'Tags

Stuff I Use

bitbucket, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, Markdown, Mercurial, OS X, Python, Review Board, S3, SQLite, Sublime Text, Ubuntu Linux

Spam Report

At least 236429 pieces of comment spam killed since 2008, mostly via Akismet.

robots.txt via Django, in one line

A significant difference between developing Django sites versus static-HTML-based approaches (among which I count PHP and the like) is that static files, aka "media", live in a dedicated spot.

Sometimes you need a piece of static content to be available at a specific URL outside your media root. robots.txt for example. This can be done in pure Django (i.e. without even touching your Apache configuration), and is especially nice if your robots.txt content is short. The example below serves a basic "keep out" configuration.

At the top of your root URLconf, add this import:

from django.http import HttpResponse

and below, among your list of URL patterns, add:

(r'^robots\.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /*", mimetype="text/plain"))

The lambda r bit is a concise way of creating a function object which accepts (and discards) the HttpRequest object that Django provides to all views. The "mimetype" setting (aka "content_type" in Django 1.0) is important too, because robots don't like text/html.

So there you have it -- a classic one-line (plus an import) robots.txt solution.

Saturday, April 25th, 2009
+ +
8 comments

Comment from Adrian Holovaty , later that day

Hey Paul,

Small suggestion: change the regex to anchor it to the end of the string and escape the dot:

r'^robots.txt$'

Another way of doing this is the direct_to_template generic view. It would be more than one line, but it would give you a little more flexibility in changing the robots.txt content.

Comment from Paul , later that day

Adrian -- good points both. I've updated the post with the more correct regex.

Comment from James Bennett , later that day

There's also a Django application floating around that lets you use the admin to manage the rules which will be placed in the robots.txt file:

http://github.com/jezdez/django-robots/tree/master

Comment from Paul , 2 days later

James - thanks for that. The one-liner method is good for absurdly short robots.txt only!

Comment from django clark , 13 weeks later

I only use robots.txt to block my admin section, so this works for me

Comment from Paul , 13 weeks later

Clark, just remember that robots.txt doesn't really "block" anything -- it just tells well-behaved search spiders what you don't want indexed.

If your goal is to keep your admin page out of Google, then adding it to robots.txt will work -- though I'm guessing there are no links to it, so Google wouldn't know about it anyway. If your goal is to increase security, you should think again.

(Apologies if you know all this already.)

Comment from Mario , 19 months later

Thanks for your post! But perhaps it is better if you serve the static file robots.txt with the apache.

For example by adding to the httpd.conf the following lines. ;)

LoadModule alias_module modules/mod_alias.so alias /robots.txt /full/path/to/robots.txt

SetHandler None

Comment from eng. Ilian Iliev , 3 years later

Wow, that is a really short one. I'll definitely try it.

Comments are closed for this post. But I welcome questions/comments via email or Twitter.