Form Spam Prevention
Posted by Joe Rebis (Import) on 15 March 2007 03:04 AM
The fight continues... It's time to tighten up your blogs and forms by implementing security steps to avoid spam.
The proliferation of blogs and web based contact forms, combined with developments in spam software over the last couple of years, have made it quite easy for spammers to submit advertising messages via your web based forms. This may include, your website's contact form or/and through unmoderated blog comment forms. Here at EPhost Web Hosting we have noticed a sharp increase in the number of spam messages sent in this fashion. In one instance we saw an unmoderated blog comment section that had over 100 different spam messages! Fortunately, there are several things you can do to reduce or eliminate such problems.
Spammers use automated software to submit advertising messages to you via your web based form. To demonstrate what this automated technology looks like, just check out the Google Tool bar feature called "Auto-Fill". The Google Tool Bar is NOT such software but shows you just how this works. When you are on any web based form (and have enabled this useful feature in the Google Tool Bar), just press "Auto-Fill" and your data is automatically entered into the form fields. Imagine this same system but running as a completely automated spider crawling the web looking for web based forms to submit.
Many blog systems (i.e. WordPress) and even the EPhost Support Center include the ability to post comments to your posted content. Comments sections invite useful and valuable user feedback to the authors of the content and well, are just fun. Unfortunately, they also have a tendency to be abused by not only spam, but potential hacking attempts. May blog systems make user registration or comment moderation optional. This means that anyone can post a publicly viewable message to your blog without being approved by you. There are several ways to add security to make spam harder, often a simple configuration change such as turning on comment moderation, requiring a login before post, and even adding the ability to utilize CAPTACHA.
For example: WordPress is the most popular "blogging" site, here is a link to WordPress on how to enable Comment Moderation.
CAPTCHA, stands for "Completely Automated Public Turing test to tell Computers and Humans Apart" and is trademarked by Carnegie Mellon University. It's a challenge-response system designed to tell if the user is human or automated. Using this system a user is presented with several graphical characters they must enter into a form field before being allowed to submit a form. Since automated software cannot determine the letters from an image, it cannot submit the form. No doubt you have seen this before, where you have to type in the crazy letters you see in an image on screen before being able to submit the form. Allegedly, some automated systems have been able to by pass these features using a dictionary based attack. Sometimes, spammers simply use a half automated and half human system for sending their messages. Usually, the problem is with the implementation of CAPTCHA. Most programming languages have tags available to implement CAPTCHA. We have such a tag available for ColdFusion pre-installed called CFX_CAPTCHA.
One commonality in this spamming technique is that automated systems may need to determine which field is the "email address" so that it can bypass any email validation you may have placed into effect. Above any other field, it has a very specific structure. They don't care if they use an email address that works or not because the form usually sends you the message. However, even anon-working email address may get them the results they wanted. This is because many forms send a copy or auto-response to the person submitting the form. So aside effect is that the message is also sent to another user. If that user doesn't exist, no problem, the message is bounced back to our mail servers, but not before hitting the intended recipient's mail server too. Needless to say, the message has the potential to get around. There are two specific things you can do in this regard.
First, to fix this issue you can try to rename your email form field in your web page's HTML code to something other than "email_address". You can see how this works by testing your form against the Google Tool Bar "Auto-Fill"feature. If it's able to figure out which field is your email address, then so will the spammers. Yes, this will keep users from being able to "Auto-Fill"their email address using the Google Tool Bar but it's a small price to pay. See Google's Toolbar Help for more information on how they determine which field is which. Suffice it to say, that pattern matching is done against the form field NAME and other elements including ALT text, to determine which type data is needed. It's not very hard to do. So, if you name your field to something like "6546546" and it has no other references to the word "email" or"address" (or any other variation) it will be hard to tell if that field is supposed to be for email or not. Likewise, if they can match/map all other fields in your form, then the remaining one is probably the email address field (Sherlock Holmes logic). I think you see the point.
Second, you NEED to perform some general email address validation and error trapping. Validation and error trapping is when the script checks to make sure all of the "earmarks" of a valid address appear in the submitted form. e.g. Is not blank field, contains an @ sign and a minimum of one period. Not only will this help with the fight against spam we REQUIRE that you do so. We require this because websites that don't trap for email addresses result in tons of unnecessary email processing. Have you ever received a blank form email submission? Sometimes even valid search engine spiders can trip the form if the submit button was a link instead of a button. Implementation is very simple. Again, most programming languages have built-in form validation.
WordPress example: If coding for WordPress, over using a plugin, you would want to refer to the WordPress Codex for Plugin API/Filters
Today, it's possible to implement a GEO Based IP test to prevent form submissions from other places in the world. Most U.S. based companies don't truly need to get form requests from all over the world. Alternatively, you can use conditional checks to confirm the email address on a second page or to execute different action based on the country found. Not all GEO Based IP checks are 100% accurate as the data changes very often. It is thought to be approximately 80%+ accurate.
To be most effective in fighting spam, you'll want to make sure that there are NO text based references to your email address on your website. Email harvesting programs look for such things. Email harvesting is a type of automated software that looks for email addresses on your website and records them for later use. Often this is used in conjunction with the automated software above so spammers can submit your form to you or use your email address in another's form. You can tell if it's text if you can select the text using your mouse. Be sure to remove all references in your HTML as well. e.g. HREF's to "mailto:email@example.com". The preceding examples are themselves examples of text based email addresses. It is best to change these references to an image that is clickable to your "new and improved" web based form. If you must use text, then try to reference it like this "y o u [at] something . com". If you want to get on the "offensive" team (we don't recommend bringing that attention to yourself), you can look into setting a "honeypot" trap. This is where a website uses methods to deliver bogus data to email harvesting programs. The offending harvesting program thinks it's found "honeypot" of email addresses. The goal is an effort to make it "unprofitable" for spammers to engage in this sort of activity by wasting their time. A secondary goal is to help log, track and even direct such activity. For instance a honeypot could be setup to purposely collect spam in a single email account for the purpose of filtering against that data using their anti-spam software.
See: Wikipedia Honeypot
In conclusion, we hope that you will take the time to implement these counter-measures in an effort to combat such activity. To do so, will greatly improve the quality of your website, reduce wasted time reviewing bogus form submissions and reduce server load which helps all users. Your developer may not have implemented such features because most clients (present company excluded) are unwilling to pay for the additional coding time or felt it was an unnecessary because they don't get many visitors. Just as likely, some of these techniques may not have been around when your form was created all those years ago. Simply put these techniques are necessary! Ironically, it's usually the website that doesn't get many visitors that is the perfect target for spammers because little attention is being paid to what's happening on that site.
Please login to your blog today and enable comment moderation and don't forget to contact your Web Developer to check your forms!