by Brian Hassel
15. June 2011 13:08
I recently noticed a huge increase in the volume of blog-comment spam entries. Though some provided a few laughs -- poor grammar, misspellings, and insults -- it was clear that the only purpose they served was to provide a link-back to the spammer's website. After a few late-night emails woke me up, I decided to implement Google's reCAPTCHA service. BlogEngine.Net makes this very easy, but it isn't difficult for other websites to use the reCAPTCHA control, as it is provided as a web service.
The coolest thing about reCAPTCHA, besides it being free, is that every time someone completes a CAPTCHA, they also help to improve optical character recognition. In a nutshell, two obfuscated words are presented to the user. In most cases, one is known, and the other is from a scanned book or newspaper. Assuming the user passes the known CAPTCHA, their 'guess' at the other word is used as data for the OCR statistical analysis engine. Collect enough data this way, and the OCR software 'learns' how to read much more like a human.
Stopping spam and adding to the effort to digitize older, printed knowledge. What could be better than that?
reCAPTCHA
19ff31ae-2747-46a7-9532-94f8a295e26f|2|4.0
Tags:
Technology