E-Scribe : a programmer’s blog

About Me

PBX I'm Paul Bissex. I build web applications using open source software, especially Django. Started my career doing graphic design for newspapers and magazines in the '90s. Then wrote tech commentary and reviews for Wired, Salon, Chicago Tribune, and others you never heard of. Then I built operations software at a photography school. Then I helped big media serve 40 million pages a day. Then I worked on a translation services API doing millions of dollars of business. Now I'm building the core platform of a global startup accelerator. Feel free to email me.

Book

I co-wrote "Python Web Development with Django". It was the first book to cover the long-awaited Django 1.0. Published by Addison-Wesley and still in print!

Colophon

Built using Django, served with gunicorn and nginx. The database is SQLite. Hosted on a FreeBSD VPS at Johncompanies.com. Comment-spam protection by Akismet.

Pile o'Tags

Stuff I Use

bitbucket, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, Markdown, Mercurial, OS X, Python, Review Board, S3, SQLite, Sublime Text, Ubuntu Linux

Spam Report

At least 236428 pieces of comment spam killed since 2008, mostly via Akismet.

Fixing Web Spam

Over on the Technorati blog I see that there's a summit on web spam happening next week. That's good. Link farms and spam blogs have been driving me batty.

For combatting the phenomenon from inside tools like Technorati, IceRocket, Feedster, Google Blog Search, and so on, I think our best bet may be collaborative reporting similar to the Razor or Pyzor email-spam-reporting networks.

On the model of Craigslist, last month Blogger introduced a "Flag" button at the top of the screen of all Blogspot-hosted blogs, which is on the right track. But nobody except Google has access to that information. A shared reporting system would mean that before I added an alleged blog to my index, or aggregator service, or whatever, I could query that central database to see if that URL had already been flagged as spam by other users.

This tech is already well-proven with email spam. I run Pyzor on my mail server, and when messages come through to my spamtrap addresses, they get immediately reported so that other users can benefit from that information. Likewise I benefit from the reports made by other users.

Obviously the system would have to be protected against poisoning by vindictive spammers, who might be tempted to report, say, all the URLs in Technorati Top 100.

Lots of other people have written about this in the past few months: David Sifry of Technorati, for one. There's also the Fighting Splog blog, and services like Splogreporter.com that may eventually be used in the way I'm describing. When Google launched their new blog search tool, it was immediately criticized for being full of spam blogs. That's an astounding oversight on their part, but mostly it points up the fact that there's as of yet no standard for attacking this problem. Let's create one.

Saturday, September 17th, 2005
+

Comments are closed for this post. But I welcome questions/comments via email or Twitter.