I'm Paul Bissex, and e-scribe.com is my consulting business. I build web applications using open source software, especially Django. I teach photographers web design and professional skills. In the '90s I did graphic design for newspapers and magazines. Then I wrote technology commentary and reviews for Wired, Salon.com, Chicago Tribune, and lots of little places you've never heard of. Feel free to email me.
I'm co-author of "Python Web Development with Django", an excellent guide to my favorite web framework. Its strong points include an introduction to Python, and better coverage of Django 1.0 than nearly anybody else. Published by Addison-Wesley, it is available from Amazon and your favorite technical bookstore as well.
Built using Django, served by Apache and mod_wsgi. The database is SQLite. The operating system is FreeBSD, on a VPS hosted at Johncompanies.com. Comment-spam protection by Akismet. Vintage topo imagery from the Maptech archive. The markup engine is Markdown.
Akismet, del.icio.us, Django, dpaste.com, Emacs, FreeBSD, Freenode, jQuery, LaunchBar, MacPorts, Markdown, Mercurial, OS X, Postfix, Python, SQLite, Subversion, TextMate, Trac, Ubuntu Linux, wmii
At least 67589 pieces of comment spam killed since January 2008, mostly via Akismet.
A significant difference between developing Django sites versus static-HTML-based approaches (among which I count PHP and the like) is that static files, aka "media", live in a dedicated spot.
Sometimes you need a piece of static content to be available at a specific URL outside your media root. robots.txt for example. This can be done in pure Django (i.e. without even touching your Apache configuration), and is especially nice if your robots.txt content is short. The example below serves a basic "keep out" configuration.
At the top of your root URLconf, add this import:
from django.http import HttpResponse
and below, among your list of URL patterns, add:
(r'^robots\.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /*", mimetype="text/plain"))
The lambda r bit is a concise way of creating a function object which accepts (and discards) the HttpRequest object that Django provides to all views. The "mimetype" setting (aka "content_type" in Django 1.0) is important too, because robots don't like text/html.
So there you have it -- a classic one-line (plus an import) robots.txt solution.
Adrian -- good points both. I've updated the post with the more correct regex.
There's also a Django application floating around that lets you use the admin to manage the rules which will be placed in the robots.txt file:
http://github.com/jezdez/django-robots/tree/master
James - thanks for that. The one-liner method is good for absurdly short `robots.txt` only!
I only use robots.txt to block my admin section, so this works for me
Clark, just remember that robots.txt doesn't really "block" anything -- it just tells well-behaved search spiders what you don't want indexed.
If your goal is to keep your admin page out of Google, then adding it to robots.txt will work -- though I'm guessing there are no links to it, so Google wouldn't know about it anyway. If your goal is to increase security, you should think again.
(Apologies if you know all this already.)
Thanks for reading! Please note: Your comment will not appear until approved, which may take a few hours or more. Spammers will be torpedoed.
Branching and merging in real life
7 comments
Summer Spam
1 comment
SPF-enabled spam domains
1 comment
Chess via iPod
2 comments
Aesthetics and computation
2 comments
Brett Spurrier
Software for determining image similarity?
24 days ago
nizamfarooq
eBay, fraud, filtering, and Web 2.0
60 days ago
Derek
World's ugliest Django app
91 days ago
sagar
Sort tables with sorttable.js
110 days ago
Paintball Kolbudy
Summer Spam
117 days ago
Copyright 2010
by Paul Bissex
and E-Scribe New Media
Hey Paul,
Small suggestion: change the regex to anchor it to the end of the string and escape the dot:
r'^robots\.txt$'
Another way of doing this is the direct_to_template generic view. It would be more than one line, but it would give you a little more flexibility in changing the robots.txt content.