robots.txt via Django, in one line
A significant difference between developing Django sites versus static-HTML-based approaches (among which I count PHP and the like) is that static files, aka “media”, live in a dedicated spot.
Sometimes you need a piece of static content to be available at a specific URL outside your media root. robots.txt for example. This can be done in pure Django (i.e. without even touching your Apache configuration), and is especially nice if your robots.txt
content is short. The example below serves a basic “keep out” configuration.
At the top of your root URLconf, add this import:
from django.http import HttpResponse
and below, among your list of URL patterns, add:
(r'^robots\.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /*", mimetype="text/plain"))
The lambda r
bit is a concise way of creating a function object which accepts (and discards) the HttpRequest
object that Django provides to all views. The “mimetype” setting (aka “content_type” in Django 1.0) is important too, because robots don’t like text/html
.
So there you have it – a classic one-line (plus an import) robots.txt
solution.
Adrian Holovaty commented :
Hey Paul,
Small suggestion: change the regex to anchor it to the end of the string and escape the dot:
r’^robots.txt$'
Another way of doing this is the direct_to_template generic view. It would be more than one line, but it would give you a little more flexibility in changing the robots.txt content.
Paul commented :
Adrian – good points both. I’ve updated the post with the more correct regex.
James Bennett commented :
There’s also a Django application floating around that lets you use the admin to manage the rules which will be placed in the robots.txt file:
http://github.com/jezdez/django-robots/tree/master
Paul commented :
James - thanks for that. The one-liner method is good for absurdly short robots.txt
only!
django clark commented :
I only use robots.txt to block my admin section, so this works for me
Paul commented :
Clark, just remember that robots.txt doesn’t really “block” anything – it just tells well-behaved search spiders what you don’t want indexed.
If your goal is to keep your admin page out of Google, then adding it to robots.txt will work – though I’m guessing there are no links to it, so Google wouldn’t know about it anyway. If your goal is to increase security, you should think again.
(Apologies if you know all this already.)
Mario commented :
Thanks for your post! But perhaps it is better if you serve the static file robots.txt with the apache.
For example by adding to the httpd.conf the following lines. ;)
LoadModule alias_module modules/mod_alias.so alias /robots.txt /full/path/to/robots.txt
<Location “/robots.txt”> SetHandler None
eng. Ilian Iliev commented :
Wow, that is a really short one. I’ll definitely try it.