I'm Paul Bissex, and e-scribe.com is my consulting business. I build web applications using open source software, especially Django. In the '90s I did graphic design for newspapers and magazines. Then I wrote technology commentary and reviews for Wired, Salon.com, Chicago Tribune, and lots of little places you've never heard of. Feel free to email me.
I'm co-author of "Python Web Development with Django", an excellent guide to my favorite web framework. Published by Addison-Wesley, it is available from Amazon and your favorite technical bookstore as well.
Built using Django, served by Apache and mod_wsgi. The database is SQLite. The operating system is FreeBSD, on a VPS hosted at Johncompanies.com. Comment-spam protection by Akismet. Vintage topo imagery from the Maptech archive. The markup engine is Markdown.
Akismet, del.icio.us, Django, dpaste.com, Emacs, FreeBSD, Freenode, jQuery, LaunchBar, MacPorts, Markdown, Mercurial, OS X, Postfix, Python, SQLite, Subversion, TextMate, Trac, Ubuntu Linux, wmii
At least 71062 pieces of comment spam killed since January 2008, mostly via Akismet.
Today I'm launching my first Google App Engine site. While I built it largely to play with GAE, it is also useful in its own right (I like to think so anyway). It does two different things:
Link shortening without redirection. Put in a godawful long Amazon link and get back a shorter Amazon link. Works with eBay and a few others too. I welcome recipes for other sites. (For the programmers in the audience, which is most of you -- yes, the processing is via regular expressions.)
It does some basic checks to confirm that the shortened URL returns the same page as the original one.
Link expansion. Put in a link from a URL shortening/redirection service, e.g. bit.ly, and see where it redirects to. Works with a slew of popular link-shorteners, including the house brands goo.gl and nyti.ms.
Some of the shortening services do offer a way to see the link target before you visit it, but they're all different; this presents a simple unified interface to that feature.
There's a bookmarklet too. If you have someone in your online life who frequently bombards you with, say, mile-long eBay links, tell them about it.
I move between a couple different computers regularly: my old 12" PowerBook and the 15" MacBook Pro my job provides me with. Like all multi-computer users I periodically bump up against the challenges of what files (and versions) are where, especially when there's work in progress.
To further complicate things, I also have an extra laptop running Ubuntu. And sometimes I just SSH to my web server from somebody else's machine.
I spent a while thinking about solutions. Some people keep a "master" home directory on a server, using rsync to pull new copies (or freshen old copies) on machines where they work. Being an rsync fan, I tried this approach. After my first accidental rsync --delete casualty, though, I started thinking about ways to preserve history.
That's when the ideal solution hit me (making a big resonant "DUH" sound): distributed version control. Perfect synchronization: check. Multi-platform clients: check. Full history: check.
I created a Mercurial repository on my web server, then cloned it out to the two laptops.
For stuff that needs to be secure, I decided that simple command-line encryption was the answer (hence this tweet from a while back with a Blowfish encrypt/decrypt one-liner). And I use SSH for transport, so even the plaintext stuff is safe from in-transit snooping.
I call the synced directory "syncbox". It contains a little script for keeping things in sync. It amounts to these steps:
hg addremove
hg commit -m "Update"
hg push
hg fetch
Ironically, after having set all this up, I got an invite to try Dropbox, a nifty-looking service that offers many of the same benefits and many other features besides (e.g. desktop OS integration, selective file sharing, browser-based acess option). About all I can tout for advantages of my approach are: 1) unlimited history (Dropbox gives you 30 days), 2) no additional fees if I exceed 2G of storage, and 3) I control it completely.
At work I still mostly use Subversion for version control. Its main selling points: stable, performs as expected, integrates nicely with Trac, holds all our old stuff (legacy inertia).
Note that "pain-free branching and merging" is not on that list. (And don't give me the old "branching is cheap in svn!" line. It's not about the branching, it's about the merging.) A couple years ago I started also using Mercurial and plan to eventually replace svn with it entirely. The aspect of Mercurial that made my life better recently is its support for branching and merging.
The scenario: an important internal web app (in use all day every school day) needed some significant changes on a short timetable. Normally I'd work on the app thus: edit the staging copy, commit, update the live copy. I didn't want to take that approach here. I knew that during the development window there might arise unrelated urgent change requests; I wanted to keep the new code isolated during development, but also deploy and track those unrelated urgent changes. Branching seemed like the right approach.
I could have made a full clone of the app (hg clone mainrepo newrepo). However, handling environment dependencies (web server, PythonPath, database) would have added time and fussiness to the job, and time was in short supply. So, using Mercurial's named-branches feature, I made a new branch (hg branch newstuff) right inside the fully-functional staging copy of the app. That way I was able to develop and test as usual, secure that my unproven work-in-progress was not "polluting" the current app's revision history.
To handle "unrelated urgent changes" as mentioned above, I'd:
hg update -r default)hg update -r newstuff)It took me a couple tries to understand how branch-switching worked, but it's simple: you really are updating your working directory to a new revision, it just happens to be a revision stored in a different branch from the current one.
It was fun looking at the graph (via HgWeb) and seeing my two parallel branches with their individual commits.
The moment of truth came at the end of the day Friday, when it was time to merge the tested and complete "newstuff" code with the current live codebase. It was dead simple, and effectively instantaneous. Condensed version: hg update -r default; hg merge -r newstuff; hg ci -m "merged new stuff". Followed by: update live copy and let out a big sigh.
Spam is occupying more than its customary share of my attention in recent weeks. I've long had a morbid fascination with sleazy human communication (hence Purportal.com). That makes the always-relentless stream of spam, though not exactly welcome, at least interesting.
Spam volume also seems to have increased during this period. The number of spam attempts my mail server rejects per day had been steady at around 3,000 for months. Now it's back up around 5,000 or 6,000.
I run my own mail server and fight spam via greylisting, blacklisting, and other strict technical rules. This setup rejects 99+% of the spam aimed at the domains I host, but some still gets through to me. Never enough to displace real mail, but enough to keep my little hobby-interest alive. Here are some of the spam highlights of my summer so far:
After one too many identical HTML spams, I took the rare step of adding a custom rule to my mail server config. I started rejecting all mail with "Content-Type: text/html; charset=us-ascii". In this age of Unicode, that's turned out to be a pretty safe bet. Lots of rejections and no known false positives.
I received a weird email about money via Craigslist. It looked like a response to an ad -- one I'd never seen before, and certainly hadn't placed. Naturally my first thought was that the Craigslist bit was all a ruse, but a at the message headers showed it was real: it had been sent via Craigslist in response to an ad with my email address attached. In other words, a Craigslist ad that had been created (copied verbatim from a legit ad) just to send spam to me via Craigslist's email forwarding feature.
I spent a few minutes trying to convince emusic.com (via email) of the fact that since I received spam at an email address that I had invented purely for use with their service, and which had never been used for anything else, this meant that somebody had poached their list from inside. They are still thinking about this silently.
I encountered a new form of referrer-spam. Remember referrer spam? Spammers would put their URLs in the HTTP_REFERER header when hitting blogs and other websites that had dynamically generated lists of "top referrers", then the spammers' sites would show up in those lists. Well, this week I saw an inscrutable but surely related anomaly in the headers of some requests made to one of my sites (which I was looking at for other reasons, not spam-hunting). This HTTP_REFERER header was a giant comma-delimited list of approximately 10 or 15 URLs.
And finally, there was the phishing message I received today. It was a fake eBay notice, with the usual "click here to resolve the dispute" links. Those links were supposed to take the victim to a fake eBay page the scammers had set up (where the victim would type in all sorts of exploitable personal information). Looking at the message's raw source, I noticed something very odd -- the pages they were trying to link to were on an FTP server in Russia. Even weirder and better, the link code contained their FTP username and password! A minute later I was logged into their FTP server, looking at the one file there: the fake eBay page.
This was a darkly humorous reminder that the international spam-and-scam business is, from what I can see, a refuge for IT people (or wannabes) with poor skills and poorer ethics. So by this point I was kind of feeling bad for the incompetent underling who had put this thing together for his terrible boss.
However, I didn't let my compassion interfere with my sense of justice and fun. I replaced their fake eBay page with my own content, a much simpler message in plain text: "We are scammers."
Among the many anti-spam measures on my mail server -- which help me reject 5000 spam attempts per day -- is SPF. SPF allows domain name owners to specify which mail servers are allowed to send its mail. That makes it an excellent way to detect address forgeries, a favorite spammer tool.
One of the early questions raised about SPF was: won't spammers just buy their own domains and set up their own SPF records that say it's all OK? You can read the answer in the SPF FAQ, but the short version is: Yes, they will, but it won't give them a free pass.
That's because if spammers register a domain, publish SPF records for it, and send spam, they've identified that domain as one intended to be used for spam. Very good blacklist fodder.
With that in mind, here's a list of about 50 domain names that have recently been used to send me spam. All of these have published SPF records, and all the spam I received was from servers approved by those SPF records.
In other words, as far as I can tell, these are domains that exist primarily, if not purely, to send spam.
If for some reason a perfectly innocent non-spammy domain of yours has made it into this list, please let me know. (You might have to use my contact form, since I've already blacklisted all these domains!)
A different kind of URL shortener
4 comments
The syncbox
2 comments
Branching and merging in real life
8 comments
Summer Spam
1 comment
SPF-enabled spam domains
1 comment
Derek
From PHP to Python
Yesterday
Brian Johnson
A different kind of URL shortener
6 days ago
Adrian Holovaty
A different kind of URL shortener
8 days ago
Ian Bicking
A different kind of URL shortener
9 days ago
aman
Sort tables with sorttable.js
15 days ago
Copyright 2010
by Paul Bissex
and E-Scribe New Media