I'm Paul Bissex, and e-scribe.com is my consulting business. I build web applications using open source software, especially Django. In the '90s I did graphic design for newspapers and magazines. Then I wrote technology commentary and reviews for Wired, Salon.com, Chicago Tribune, and lots of little places you've never heard of. Feel free to email me.
I'm co-author of "Python Web Development with Django", an excellent guide to my favorite web framework. Published by Addison-Wesley, it is available from Amazon and your favorite technical bookstore as well.
Built using Django, served by Apache and mod_wsgi. The database is SQLite. The operating system is FreeBSD, on a VPS hosted at Johncompanies.com. Comment-spam protection by Akismet. Vintage topo imagery from the Maptech archive. The markup engine is Markdown.
Akismet, bitbucket, del.icio.us, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, Markdown, Mercurial, OS X, Postfix, Python, Review Board, S3, SQLite, TextMate, Ubuntu Linux
At least 95836 pieces of comment spam killed since January 2008, mostly via Akismet.
A significant difference between developing Django sites versus static-HTML-based approaches (among which I count PHP and the like) is that static files, aka "media", live in a dedicated spot.
Sometimes you need a piece of static content to be available at a specific URL outside your media root. robots.txt for example. This can be done in pure Django (i.e. without even touching your Apache configuration), and is especially nice if your robots.txt content is short. The example below serves a basic "keep out" configuration.
At the top of your root URLconf, add this import:
from django.http import HttpResponse
and below, among your list of URL patterns, add:
(r'^robots\.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /*", mimetype="text/plain"))
The lambda r bit is a concise way of creating a function object which accepts (and discards) the HttpRequest object that Django provides to all views. The "mimetype" setting (aka "content_type" in Django 1.0) is important too, because robots don't like text/html.
So there you have it -- a classic one-line (plus an import) robots.txt solution.
Adrian -- good points both. I've updated the post with the more correct regex.
There's also a Django application floating around that lets you use the admin to manage the rules which will be placed in the robots.txt file:
http://github.com/jezdez/django-robots/tree/master
James - thanks for that. The one-liner method is good for absurdly short `robots.txt` only!
I only use robots.txt to block my admin section, so this works for me
Clark, just remember that robots.txt doesn't really "block" anything -- it just tells well-behaved search spiders what you don't want indexed.
If your goal is to keep your admin page out of Google, then adding it to robots.txt will work -- though I'm guessing there are no links to it, so Google wouldn't know about it anyway. If your goal is to increase security, you should think again.
(Apologies if you know all this already.)
Thanks for your post!
But perhaps it is better if you serve the static file robots.txt with the apache.
For example by adding to the httpd.conf the following lines. ;)
LoadModule alias_module modules/mod_alias.so
alias /robots.txt /full/path/to/robots.txt
<Location "/robots.txt">
SetHandler None
</Location>
Thanks for reading! Please note: Your comment will not appear until approved, which may take a few hours or more. Spammers will be torpedoed.
Booktools
2 comments
A different kind of URL shortener
4 comments
The syncbox
2 comments
Branching and merging in real life
8 comments
Summer Spam
1 comment
malpaso
Understanding tuples vs. lists in Python
10 days ago
vj100
Understanding tuples vs. lists in Python
10 days ago
scott
Bicycle Repair Man bundle for TextMate
16 days ago
Jasmine
Trying to send eBay a message?
53 days ago
Smok Cigs
Let's play a game: BASIC vs. Ruby vs. Python vs. PHP
90 days ago
Copyright 2012
by Paul Bissex
and E-Scribe New Media
Hey Paul,
Small suggestion: change the regex to anchor it to the end of the string and escape the dot:
r'^robots\.txt$'
Another way of doing this is the direct_to_template generic view. It would be more than one line, but it would give you a little more flexibility in changing the robots.txt content.