Help, crawler bots are eating my website

Posted: , Updated: Category: Computers

Last week I noticed that my web-hosting account was chewing through significantly more resources than usual. I have a pre-paid web-hosting account, which usually costs about $0.10c per day to run. Yet for a few days, my website was running through about $2.00/day (20× the usual amount.)

What in heck’s name is going on?

After some log file analysis, it appears that a combination of things happened:

  • My poorly secured, poorly monitored Dokuwiki instance got heavily spammed a couple of years ago, generating thousands upon thousands of spam pages.
  • The spam pages got indexed by various search engines.
  • The MJ12bot crawler kept the spam page URLs in its crawling list long after they were deleted, and repeatedly tried to fetch thousands of pages that didn’t exist, every day.

Naturally, each crawler hit used a small amount of processing time, and the cumulative number of crawler hits inflicted the death of a thousand cuts on my web hosting account.

To fix this, I nuked the Dokuwiki installation from orbit - no loss, since I wasn’t using it anyway. Then I banned MJ12bot from crawling my site by means of robots.txt. MJ12bot is apparently on the standard list of banned bots, so this is no great loss either.

The effect is dramatic:

screenshot.2036

My site’s running cost per day has dropped from $2.00/day to $0.10/day - as it should be.