Blog Comment Spam and reCAPTCHA

by Brian Hassel 15. June 2011 13:08

I recently noticed a huge increase in the volume of blog-comment spam entries. Though some provided a few laughs -- poor grammar, misspellings, and insults -- it was clear that the only purpose they served was to provide a link-back to the spammer's website. After a few late-night emails woke me up, I decided to implement Google's reCAPTCHA service. BlogEngine.Net makes this very easy, but it isn't difficult for other websites to use the reCAPTCHA control, as it is provided as a web service.

The coolest thing about reCAPTCHA, besides it being free, is that every time someone completes a CAPTCHA, they also help to improve optical character recognition. In a nutshell, two obfuscated words are presented to the user. In most cases, one is known, and the other is from a scanned book or newspaper. Assuming the user passes the known CAPTCHA, their 'guess' at the other word is used as data for the OCR statistical analysis engine. Collect enough data this way, and the OCR software 'learns' how to read much more like a human.

Stopping spam and adding to the effort to digitize older, printed knowledge. What could be better than that?

reCAPTCHA

Tags:

Technology

Using the Wrong Version Control System is Costing You (Part 1 of 2)

by Brian Hassel 3. June 2011 13:31

Almost every developer worth his or her salt recognizes the importance of using a version control (revision control) solution to manage software source code. To the uninitiated, version control software allows files – typically text-based files – to be managed in terms of revisions. The job of the version control software is to keep a running log of any changes or additions to the files under its control.

The benefits to using version control for software code are enormous. Fixes or modifications can be made and ‘checked in’ along with comments as to why the change was made. If any new bugs are introduced by the change, the versions can be compared (along with the comments about the change) to determine what additional changes need to be made.

Version control is very useful for a single developer, but it becomes an absolute necessity once a project has multiple contributing developers. Most version control systems (VCS) include powerful merging tools to allow changes made to the same file, by separate developers, to be combined with relative ease. In addition, the running log of changes and comments by other developers facilitates a tightly integrated team.

So where’s the debate?

One of the most widely used version control systems is Subversion, and with good reason. It’s free and open source, has great client-tool support, and is easy to get started with. Subversion, like many other previous-generation version control systems, employs a client-server model. That is, a central server holds the master database of files and changes (known as the repository), and individual clients are able to pull/push changes as necessary. The server is always the master copy, and all changes must ultimately run through it.

Initially, this sounds like the perfect model. All developers must ensure they push changes to a centrally managed location. There is never any doubt to which ‘copy’ of a file is considered the master copy; it’s on the central server.

Unfortunately, this model doesn’t work well when developers don’t have quick and reliable access to the central repository – say on a local LAN – and instead work from remote locations, sometimes disconnected, etc. In addition, large modifications to the code base may require multiple days or weeks to complete. During this period, the code may not compile correctly, or it may exhibit strange behavior. So if I, a developer tasked with a multi-day project, have made some code modifications and wish to call it a day, I’m now faced with a no-win choice. If I do not commit the code to the central repository, I have removed one of the primary benefits to using version control. If, on the other hand, I commit my incomplete changes to the server, everyone else that may need to pull a recent copy will not be able to build the software correctly. Fortunately (somewhat), I can ‘branch’ the code I am working on so that the existing code (commonly called the ‘trunk’) is not in a state of disarray.

In a perfect world, branching, and the task of merging multiple branches back into the trunk, would be an easy task. Subversion does not make it so. Anyone that has spent enough time with Subversion has an understandable fear of reintegrating multiple branches. It can be done, but it takes effort and planning. It becomes clear that the central model adds a level of complexity and rigor to branching, a feature that is essential to a multi-member development team.

Even if a reliable approach to branching is determined, Subversion, or more specifically the client-server model, still does not work well for a distributed, sometimes disconnected, team. It was clear that a different approach to version control was required, one that was designed, from the ground up, with distributed development in mind…

A follow-up post will discuss distributed version control systems, and why they work better for real-world development.

Tags:

Code | Process Management

Powered by BlogEngine.NET 2.0.0.36
Theme by Mads Kristensen | Modified by Mooglegiant

About the Author

Brian Hassel is a software developer with over a decade of experience designing and building enterprise solutions.

His current focus is exploring novel designs to better integrate data-centric systems. Many of these ideas have been commercialized in the Tesseract Framework™, offered by his current collaborative venture, 4DIQ.

Email Brian

Month List