Would you trust a computer to read your most precious documents? Surprisingly, some risk-adverse government agencies are relying on computers to tell them what's written on their documents and forms. Tax returns, recreational license applications, police reports, traffic citations and court dockets are just a few examples of the many forms and documents computers are reading these days.
Both the number of agencies using recognition systems and the types of documents computers are reading has steadily grown in recent years. States and localities are taking advantage of improvements in hardware and software that, over the years, have generated significant benefits. For example, a number of state revenue agencies now use recognition software to read tax returns. As a result, they have dramatically reduced processing costs.
How dramatic can it get? A recognition system used in high-volume applications can reduce data-entry costs by as much as 70 percent, according to Arthur Gingrande Jr., a partner with Imerge Consulting. Properly applied, a recognition system can shrink an organization's data-entry labor force by as much as 60 percent.
Recognition technology also delivers indirect benefits to workers, such as a reduction in the number of cases of carpel tunnel and repetitive stress syndrome as well as eyestrain problems. But these benefits don't come easy. The harder it becomes to read the characters on a form or document, the more likely an error will occur. More errors mean more time spent manually correcting what the computer misread. With high error rates, benefits can quickly evaporate. Difficult-to-read documents and forms also require more expensive solutions. If your agency wants computers to read handwriting, expect to pay a bundle to get it done.
ACCEPT NO SUBSTITUTE
Recognition software, referred to as OCR (optical character recognition) and ICR (intelligent character recognition), has steadily grown in importance as a subset of document imaging technology. Both technologies convert visually readable characters into ASCII text, which a computer can store, edit and process. OCR, which was developed first, recognizes type fonts by pattern matching, character assessment and a crude learning process. ICR reads hand printing and, to a lesser extent, handwriting.
OCR, which requires less computing power than ICR to recognize type fonts, can be installed in low-end imaging systems for a few hundred dollars. Customers can purchase special scanners with OCR built in, so recognition takes place as the typed documents are scanned. ICR, on the other hand, requires lots of computing horsepower to recognize and read hand-printed letters and numbers. While some low-end versions of ICR exist, their results can be quite dismal. Apple Computer installed a simple version of ICR on its handheld computer, the Newton, that ended up making recognition software look like a bad joke.
What made some of the Newton's attempts at reading hand-printed characters so funny was something called contextual editing -- one of the many tools employed in recognition systems to bolster the accuracy of OCR and ICR. Accuracy is the holy grail of recognition technology -- impossible to reach, but always sought after.
OCR and ICR software measures accuracy based on the mistakes it knows it made. When the software can't decipher a character, it will highlight the error and, at the end of the job, present the user with the percentage of errors it made. The biggest problem with recognition software lies with substitution errors. These occur when the OCR or ICR software (called the engine) is convinced it has read a character correctly when, in fact, it's wrong. An OCR engine claiming 98 percent accuracy may actually have a true error rate of 93 percent when substitution errors are factored in.
There are three stages in the recognition process that affect accuracy. The prerecognition stage covers everything from the type of paper that will be scanned and the design of the form to the actual