Blog Comment Spam and reCAPTCHA

by Brian Hassel 15. June 2011 13:08

I recently noticed a huge increase in the volume of blog-comment spam entries. Though some provided a few laughs -- poor grammar, misspellings, and insults -- it was clear that the only purpose they served was to provide a link-back to the spammer's website. After a few late-night emails woke me up, I decided to implement Google's reCAPTCHA service. BlogEngine.Net makes this very easy, but it isn't difficult for other websites to use the reCAPTCHA control, as it is provided as a web service.

The coolest thing about reCAPTCHA, besides it being free, is that every time someone completes a CAPTCHA, they also help to improve optical character recognition. In a nutshell, two obfuscated words are presented to the user. In most cases, one is known, and the other is from a scanned book or newspaper. Assuming the user passes the known CAPTCHA, their 'guess' at the other word is used as data for the OCR statistical analysis engine. Collect enough data this way, and the OCR software 'learns' how to read much more like a human.

Stopping spam and adding to the effort to digitize older, printed knowledge. What could be better than that?

reCAPTCHA

Tags:

Technology

Powered by BlogEngine.NET 2.0.0.36
Theme by Mads Kristensen | Modified by Mooglegiant

About the Author

Brian Hassel is a software developer with over a decade of experience designing and building enterprise solutions.

His current focus is exploring novel designs to better integrate data-centric systems. Many of these ideas have been commercialized in the Tesseract Framework™, offered by his current collaborative venture, 4DIQ.

Email Brian

Month List