Comment Spam War
On a light day, I get approximately 22 requests per hour, 24 hours a day, to add spam comments to this site. Sometimes the same IP pops up for spans at a time, sometimes it's from hundreds of different IP addresses throughout the day - most of them likely spoofed.
For the last decade, it's been a continuous side project to prevent this from happening with the least amount of intervention by me, without using a third party service, and the least amount of hurdles for somebody to actually make a legitimate comment. To this day, I have yet to nail down a perfect solution, but what I do to prevent it and what I've learned may be somewhat useful to others experimenting with the same problem.
Here's an hour snippet of my logs on a typical day, with actual IP addresses masked.2014-03-19 11:51:49 Comment blacklisted from 175.44.X.X IP(175.44.X.X)
2014-03-19 11:50:09 Comment CAPTCHA code does not match. IP(31.41.X.X)
2014-03-19 11:31:50 Comment blacklisted from 112.111.X.X IP(112.111.X.X)
2014-03-19 11:24:03 Comment blacklisted from 112.111.X.X IP(112.111.X.X)
2014-03-19 11:19:16 Comment CAPTCHA code does not match. IP(146.0.X.X)
2014-03-19 11:18:15 Comment CAPTCHA code does not match. IP(137.175.X.X)
2014-03-19 11:15:50 Comment CAPTCHA code does not match. IP(137.175.X.X)
2014-03-19 11:15:47 Comment CAPTCHA code does not match. IP(137.175.X.X)
2014-03-19 11:15:34 Comment blacklisted from 175.44.X.X IP(175.44.X.X)
2014-03-19 11:13:08 Comment CAPTCHA code does not match. IP(91.207.X.X)
2014-03-19 11:07:15 Comment blacklisted from 175.44.X.X IP(175.44.X.X)
2014-03-19 11:07:06 Comment blacklisted from 175.44.X.X IP(175.44.X.X)
2014-03-19 11:00:22 Comment blacklisted from 112.5.X.X IP(112.5.X.X)
2014-03-19 10:59:49 Comment CAPTCHA code does not match. IP(199.15.X.X)
2014-03-19 10:54:15 Comment CAPTCHA code does not match. IP(216.151.X.X)
2014-03-19 10:54:14 Comment CAPTCHA code does not match. IP(216.151.X.X)
2014-03-19 10:54:13 Comment CAPTCHA code does not match. IP(216.151.X.X)
2014-03-19 10:54:12 Comment CAPTCHA code does not match. IP(216.151.X.X)
2014-03-19 10:54:10 Comment CAPTCHA code does not match. IP(216.151.X.X)
2014-03-19 10:54:10 Comment CAPTCHA code does not match. IP(216.151.X.X)
2014-03-19 10:54:07 Comment CAPTCHA code does not match. IP(216.151.X.X)
2014-03-19 10:54:05 Comment CAPTCHA code does not match. IP(216.151.X.X)
I have not gone to great lengths to gather a lot of data about spam comments, but I do gather some data. It became obvious after awhile that the "algorithm" used to post spam comments does change. Every time I put something new in place to prevent spam comments, they will completely stop for awhile. Then, all the sudden they will come back- usually trickle back. This is basically a constant cycle. Letting comment spam through even momentarily makes your website an indefinite target.
Here are some examples of what I do, but these measures are constantly changing.
Check Referer
When a comment is posted, I check the referer. It must be coming from this website and as an extra check, it must be coming from the right page.
$splitup = parse_url($_SERVER['HTTP_REFERER']); if (strtolower($splitup['host']) != strtolower($config['domain'])) { return false; }
You can also do this in your .htaccess with something like this:
<IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{REQUEST_METHOD} POST RewriteCond %{REQUEST_URI} .comment.php* RewriteCond %{HTTP_REFERER} !.*yourdomainname.* [OR] RewriteCond %{HTTP_USER_AGENT} ^$ RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L] </IfModule>
Blacklist
I have a blacklist of IP addresses that are not allowed to post comments. The blacklist was initially seeded with all address blocks from China. Those IP addresses accounted for 99% of spam comments to this site. I manually add IP addresses to this regularly if I see repeat offenders.
Here's the basic functionality to check if an IP address is blacklisted according to the CSV. You could just as well load the CSV into a database.
class Comment { public static function blacklisted($ip) { if (($handle = fopen(dirname(__FILE__).'/blacklist.csv', 'r')) !== FALSE) { while (($data = fgetcsv($handle, 100)) !== FALSE) { if (Comment::ip_in_range($data[0],$data[1],$ip)) { return true; } } fclose($handle); } else { trigger_error("blacklist.csv not found"); } return false; } private static function ip_in_range($start,$end,$ip) { $s = ip2long($start); $e = ip2long($end); $i = ip2long($ip); if ($s !== false && $e !== false && $i !== false) { return ($i <= $e && $s <= $i); } return false; } } if (Comment::blacklisted($_SERVER['REMOTE_ADDR'])) { return false; }
CAPTCHA
Entering a CAPTCHA code from an image is require to post a comment. I know this works because this alone causes posting comments to fail most of the time. However, it definitely adds a barrier to legitimate comment posters. You could use something like this, but there are many solutions.
Limit Number of URLs in a Comment
Surprisingly, this was the latest major update and it worked wonderfully for several months. Not a single spam comment until they figured it out. I limited the number of URL's in a comment to 3.
Here's an example for just checking http:// links.
if (substr_count($_POST['cmtcomment'],"http://") > 3) { $notice = "Hey that's a lot of links in your comment. You're not a spammer are you? Remove some links from <a href=\"#comment\">your comment</a> to prove it."; return false; }
Honeypot Fields
The idea of a honeypot field is great and it works really well. I add hidden fields to the comment form and if any of those fields is populated, the comment is rejected. This flag gets hit a lot which means that the comment spam is mostly automated.
Other Ideas
One idea i've had is to check the amount of time between visiting the page and posting a comment. For example, if that happens within 5 seconds, there's no way a human visited the page, went to the bottom of it, and wrote a comment all within 1 second.
Another idea would be to go to a more complex form of CAPTCHA that requires a minimal amount of thinking, but hard for AI. An example question would be: "What country borders the United States to the north?"