In 1976, Optical Character Recognition (OCR) was invented and commercialized by Raymond Kursweil as a software tool to convert the printed word into data characters. Initially, the invention was targeted to benefit the blind and visually impaired population so printed books and text could be converted into text-to-speech applications. Kursweil’s company, Kursweil Computer Products, developed the initial products for the document production industry who embraced it mostly to convert reams of paper to digital data. The technology was adopted quickly by those who needed to edit original text which was only in “printed page” form.
Early commercial systems using the Kursweil invention were developed to combine automatic document feeding scanners (another great invention for document scanning) with the OCR recognition software. The marriage of the Kursweil invention and Xerox produced the Xerox Kurzweil K-5000 in 1988, one of the first commercial high-speed scanner and OCR text conversion systems. Statco was an early adopter of this technology and acquired this system in 1989. In the early 1990’s we scanned and converted thousands of legal and other documents into MS Word and WordPerfect files at over $1 per page. Today, the same service is delivered at $0.05 per page.
The quality of the OCR conversion technology between 1990 and today has improved substantially. In the beginning, the accuracy of the characters depended on the quality and clarity of the pages scanned, plus the ability of the technology to “match” defined character fonts printed on the page. This process might accurately convert only 65%-80% of the printed characters on each page. In 1990, this was a better option than re-typing the entire page into a word processing program but today the OCR accuracy is 85% – 95+%.
The other incredible application of OCR technology is converting information from repetitive data forms into structured databases. Medical claim forms, payment vouchers, surveys and other forms are scanned and processed using pre-defined data field locations on a printed page and mapped to data base fields in a file. This data capture methodology is a very efficient use of OCR technology since only portions of the printed page is processed for the target data fields.
The invention of OCR technology has exploded the access to legacy document content using broad reaching “search” applications over internet and intranet networks. The delivery of this content to the mass population would have been almost impossible without OCR technology.
Please email your project and contact information