By Jeremy Wagstaff
Whatever useful stuff the good guys come up with, the bad guys ain't far behind.
You've probably come across those little boxes on Web pages where you have to figure out the letters or numbers that are obscured by lines, or at odd angles, or otherwise tricky to decipher.
They're called CAPTCHA -- short for ""Completely Automated Public Turing test to tell Computers and Humans Apart"" -- and are designed to do just that.
Since spammers and others write software which automates signing up for, say, email accounts, or place spam in the comments sections of blogs, CAPTCHAs try to weed out the human from the non-human by forcing them to decipher something that, in theory, only a real person could.
In reality there are plenty of efforts to get around this, usually by using optical character recognition, or OCR, the same technique used by your scanner when it converts a newspaper article or book page to text your computer can understand.
CAPTCHA has its critics, not least because it wastes time. In fact, about 60 million of those nonsensical jumbles are solved every day, taking about 10 seconds each to decipher and type in. That's more than 150,000 hours of deciphering and typing.
Which inspired some Carnegie Mellon researchers to find a way to put this time to good use by using people's deciphering to go through books being digitized by an online project called the Internet Archive to weed out problem words that can't be converted using ordinary OCR.
Those words are sent to CAPTCHAs and then the results fed back into the scanning engine.
Here's the neat bit, though, as explained on the Carnegie Mellon website:
But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
An excellent idea, and I've since seen it deployed on quite a lot of websites. Sadly, it didn't take long for the bad guys to figure out they could use the same approach. Now a sleazeball has found a way to do the same thing: get folk to decipher CAPTCHA texts through a small program, delivered by Trojan, that offers striptease in exchange for guessing the texts correctly.
The unwitting victim finds the program on his computer, disguising itself as a strip-tease game, according to antivirus manufacturer Trend Micro, wherein a scantily-clad ""Melissa"" agrees to take off some of her clothing. ""However,"" Trend Micro reports, ""for her to strut her stuff, users must identify the letters hidden within a CAPTCHA. Input the letters correctly, press ""go"" and ""Melissa"" reveals more of herself.""
You may see a bit more of Melissa, but at the same time the answers are sent to a remote server, where the bad guy uses the results to identify and match ambiguous CAPTCHA images from legitimate sites.
Much as Carnegie Mellon links the responses to its CAPTCHA images to the text in scanned books, the bad guy links those images to, in this case at least, pages for signing up for Yahoo! email addresses. (Why would a spammer want those addresses? To use them as addresses from which to spam other people, probably, either directly or as the log-on addresses for other websites which require a legitimate email address.)
CAPTCHAs may be irritating, but they do help to deter spam. Don't believe me? More than 90 percent of comments submitted to blogs are spam, and if you don't see much spam on blogs, it's because they've been weeded out, either by CAPTCHA or by other methods.
I have CAPTCHA on my blog, and while a few people complain, I get a lot less spam than I used to. Even then, there are a few people who have found a way through. With Melissa and her gang, now I think I know how they do it.
Jeremy Wagstaff can be found online at www.loosewireblog.com or via email at jeremy@loose-wire.com.