Help needed to Digitise the Books— reCaptcha

Posted on: July 28, 2007

I think all tech people heard about Captcha’s and almost every one has used it while signing up for online service,the image with distorted text on them.

Captcha is actually a program which tells whether its user is a human or a machine by using simple concept of pattern matching (visual as well as audio). Which finds its application in reducing the spam content and to prevent abuse by automated programs like bots.

Usually people add the text shown in the image and it takes very less time(around 10 seconds) to solve the captcha by humans. But the same time can be utilized in some constructive work. Carnegie Mellon University came up with an idea to utilize this time wisely.

You may be aware of the OCR(Optical Character Recognition) software’s which can be used to digitize the old books. They are not very efficient in recognizing the words present in the manuscripts and books.

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

You can help them to digitize the books by signing at their site and using the code in your programs.

