E-Scribe News : a programmer’s blog

About Me

PBX I'm Paul Bissex, and e-scribe.com is my consulting business. I build web applications using open source software, especially Django. I teach photographers web design and professional skills. In the '90s I did graphic design for newspapers and magazines. Then I wrote technology commentary and reviews for Wired, Salon.com, Chicago Tribune, and lots of little places you've never heard of. Feel free to email me.

Book

Python Web Development with Django I'm co-author of "Python Web Development with Django", an excellent guide to my favorite web framework. Its strong points include an introduction to Python, and better coverage of Django 1.0 than nearly anybody else. Published by Addison-Wesley, it is available from Amazon and your favorite technical bookstore as well.

Colophon

Built using Django, served by Apache and mod_wsgi. The database is SQLite. The operating system is FreeBSD, on a VPS hosted at Johncompanies.com. Comment-spam protection by Akismet. Vintage topo imagery from the Maptech archive. The markup engine is Markdown.

Pile o'Tags

Stuff I Use

Akismet, del.icio.us, Django, dpaste.com, Emacs, FreeBSD, Freenode, jQuery, LaunchBar, MacPorts, Markdown, Mercurial, OS X, Postfix, Python, SQLite, Subversion, TextMate, Trac, Ubuntu Linux, wmii

Spam Report

At least 67589 pieces of comment spam killed since January 2008, mostly via Akismet.

robots.txt via Django, in one line

A significant difference between developing Django sites versus static-HTML-based approaches (among which I count PHP and the like) is that static files, aka "media", live in a dedicated spot.

Sometimes you need a piece of static content to be available at a specific URL outside your media root. robots.txt for example. This can be done in pure Django (i.e. without even touching your Apache configuration), and is especially nice if your robots.txt content is short. The example below serves a basic "keep out" configuration.

At the top of your root URLconf, add this import:

from django.http import HttpResponse

and below, among your list of URL patterns, add:

(r'^robots\.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /*", mimetype="text/plain"))

The lambda r bit is a concise way of creating a function object which accepts (and discards) the HttpRequest object that Django provides to all views. The "mimetype" setting (aka "content_type" in Django 1.0) is important too, because robots don't like text/html.

So there you have it -- a classic one-line (plus an import) robots.txt solution.

Saturday, April 25th, 2009
+ +
6 comments

Comment from Adrian Holovaty , later that day

Hey Paul,

Small suggestion: change the regex to anchor it to the end of the string and escape the dot:

r'^robots\.txt$'

Another way of doing this is the direct_to_template generic view. It would be more than one line, but it would give you a little more flexibility in changing the robots.txt content.

Comment from Paul , later that day

Adrian -- good points both. I've updated the post with the more correct regex.

Comment from James Bennett , later that day

There's also a Django application floating around that lets you use the admin to manage the rules which will be placed in the robots.txt file:

http://github.com/jezdez/django-robots/tree/master

Comment from Paul , 2 days later

James - thanks for that. The one-liner method is good for absurdly short `robots.txt` only!

Comment from django clark , 13 weeks later

I only use robots.txt to block my admin section, so this works for me

Comment from Paul , 13 weeks later

Clark, just remember that robots.txt doesn't really "block" anything -- it just tells well-behaved search spiders what you don't want indexed.

If your goal is to keep your admin page out of Google, then adding it to robots.txt will work -- though I'm guessing there are no links to it, so Google wouldn't know about it anyway. If your goal is to increase security, you should think again.

(Apologies if you know all this already.)

Post a comment

Thanks for reading! Please note: Your comment will not appear until approved, which may take a few hours or more. Spammers will be torpedoed.


(Will not be shared)

(Optional)