OCR scanners eliminated
Natt Piyapramote, a 17-year-old student from Ratchaburi province, never thought that his technology development project would be judged the best among 1,500 entries from 40 countries at the world's largest science competition - the Intel International Science and Engineering Fair (Intel ISEF 2006) - held in the United States earlier this month.
The student from Sarasit Phitayalai School spent six months developing image binarisation software and won the Grand Award for his statistical-based Adaptive Binarisation for Document Imaging project.
The project was to develop software that utilises binarisation techniques to convert a document image captured from a conventional digital camera to a black-and-white image file for use with Optical Character Recognition (OCR), which is a kind of software that normally converts a scanned document image file into a text file, allowing users to edit and alter information.
"Normally, a document image file that uses OCR comes from a scanner, but I had an idea to make the process simpler by using a digital camera, which most of us already have, to capture the image for further OCR processing.
With this method we can bypass the use of the scanner," he said.
However, he realised that images from a digital camera did not have good enough quality to be recognised by OCR software and convert it into a text file, so he adopted a binarisation technique to solve the problem.
Image binarisation is a technique that converts a document image into a black and white image which can then be converted to text using OCR.
Natt said that by adopting the binarisation process, image files captured from a digital camera are rendered with the same quality as those scanned conventionally.
With its higher quality, the file can then be fed to OCR software.
Image binarisation is not something new, but what made Natt's development unique is that it's automated. "Just put the image file into a computer and have the software proceed. The outcome is automatic with no manual involvement," he said.
Natt's image binarisation software has a 14-per cent error rate, which he said is acceptable.
He also compared his software with the world's best existing binarisation technique and found that his software was just 5 per cent lower in efficiency than the world's best, while it offers faster processing time and is more automated.
The development will not end there. Natt said that he has plans to improve his software's capability to offer a lower error rate and be more efficient during the conversion process.
"I will work on the software and hope to reduce the error rate to only 5 per cent," he added, to make it widely acceptable.
Natt also has plans to put the software on the Web and allow people who want to convert document image files into text files without the use of a scanner to use the service free of charge during pilot tests.
"In the initial stages, I will offer the service through my school's website, and I expect to launch it a month later. Meanwhile, I will talk to the National Electronics and Computer Technology Centre (Nectec) to integrate this service into its website as well," he said.
Natt's project is a part of Nectec's Young Scientist Competition 2006, where it also won first prize, and in Intel ISEF 2006, his project received awards from the Association for Computing Machinery in the United States as well as awards from American Association for Artificial Intelligence as the best project in the area of computer science with an artificial intelligence component.