Posts tagged: SPAM

More on "Splogs"

“Splog” as a label for spam blogs seems to be taking off. I’m not crazy about it, because I think the challenges and possible solutions of fake-blog spam sites have huge overlap with fake-portal and fake-search-engine link farms. The difference is mostly significant to people who run blog indexing services.

Not to discount their needs or their efforts. J. Scott Johnson, CTO of Feedster, weighs in today with a piece in Online Media Daily. One of his best points: “Can the war on splogs be won? No.” In other words, expect to deter and minimize blog spam,not to eliminate it.

Fixing Web Spam

Over on the Technorati blog I see that there’s a summit on web spam happening next week. That’s good. Link farms and spam blogs have been driving me batty.

For combatting the phenomenon from inside tools like Technorati, IceRocket, Feedster, Google Blog Search, and so on, I think our best bet may be collaborative reporting similar to the Razor or Pyzor email-spam-reporting networks.

On the model of Craigslist, last month Blogger introduced a “Flag” button at the top of the screen of all Blogspot-hosted blogs, which is on the right track. But nobody except Google has access to that information. A shared reporting system would mean that before I added an alleged blog to my index, or aggregator service, or whatever, I could query that central database to see if that URL had already been flagged as spam by other users.

Google blog search

Google has launched a blog search tool. Given how long it took them to get around to it, it’s rather underwhelming. Also, I’m seeing a lot of spam blogs in the results – despite my recent attempts to mock such sites into oblivion they seem to be flourishing. Some Craigslist-style flagging options (also now offered by some blog hosting services) are sorely needed.

Fun with link farms

I’ve really started to get fed up with link farms, spam blogs, and other wastes of cyberspace that merely exist to trick naive users into a few AdSense clicks. Luckily, many of these sites are not very well constructed and so it’s possible to have some fun at their expense.

At some point I’d like to organize a competition where people submit screenshots and URLs from dopey sites that can be made to embarrass themselves. Here’s my submission (via my link redirector, to prevent any transfer of Google juice).

Unexpanded Macro Spam

Verbatim headers from a spam I recently received:

Subject: STR_RNDLEN(2-4)}{EXTRA_TIME_4} {WORD}
Date: {DATE}

That’s not going to sell much {PRODUCT}…

Spam stats

One technical interest I haven’t written much about here is spam. I have a fairly aggressive anti-spam setup, and I have a simple spam statistics page that gives hourly breakdowns. But what I’ve wanted for a long while is some way to aggregate spam stats from other servers into a sort of spam weather report. There are all sorts of reasons why this is impossible to do perfectly – people have different criteria for what constitutes spam, for one – but I still think a useful model for sharing data could be worked out. People who are already generating spam stats could publish their data in a microformat, for example. Alternatively, they could submit periodic automatic reports to a central server, which would then make the stats available in machine-readable form. The key would be to make it easy for people to make their data available.