|
OCR
Software
OCR is a field of research in pattern recognition, Artificial Intelligence (AI)
and Machine Vision. Though research is still being conducted in the field,
the implementation of proven techniques such as the use of mirror and lenses
and digital character recognition which were originally considered separate
fields but because very few applications survive that use “true” (or
original) optical techniques, the OCR term has now been broadened to include
digital image processing as well.
Basics of OCR
Optical character recognition or OCR, is the mechanical or electronic
translation of images of typewritten or handwritten text; this is usually
captured by a scanner into a computer-editable text.
So basically, what it does is it scans handwritten images and text. It is
the technology behind scanners and other hardware that turn “real” text and
images into digital objects.
Development of OCR
Early systems required training (the provision of known samples of each
character) to read a specific font but now, advance systems with a high
degree of recognition accuracy for most fonts are common.
Some systems are even capable of reproducing formatted output that closely
approximates the original scanned page including images, columns and other
non-textual components.
The accurate recognition of Latin Script, typewritten text is now considered
largely a solved problem. Typical accuracy rates exceed 99%, although
certain applications demanding even higher accuracy require human review for
errors.
Handwriting and hand printing is still the subject of active research, as is
recognition of printed text in other scripts (especially those with a very
large number of characters) are still being studied and trained into OCR
hard wares today.
OCR Softwares in The Market Today
1. Apple Newton Software
The Apple Newton software is one of the most popular softwares that are
being used in the market today. It doesn’t get too much recognition because
its name is not printed on the programs that it runs. It is the technology
behind almost all the Personal Digital Assistants (PDA’s) from Palm OS and
other companies.
This variety of OCR is now commonly known in the industry as ICR, or
Intelligent Character Recognition.
The algorithms used in these devices take advantage of the fact that the
order, speed, and direction of individual lines segments at input are known.
Also, the user can be retrained to use only specific letter shapes.
These methods cannot be used in software that scans paper documents, so
accurate recognition of hand-printed documents is still largely an open
problem. Accuracy rates of 80% to 90% on neat, clean hand-printed characters
can be achieved, but that accuracy rate still translates to dozens of errors
per page, making the technology useful only in very limited contexts.
2. Other OCR Softwares
There are several other OCR software applications available to convert
scanned images to text, Word, HTML or searchable PDF. The differences
between them can often be obscure, and some cost around $100 while others
cost $500!
The main features that differentiate OCR applications are:
• OCR accuracy
• Page layout reconstruction accuracy
• Multi-engine voting technology
• Support for languages
• Support for searchable PDF output
• Speed
• User interface
Because of the infinite combinations of document types, OCR engines and
settings, it may be possible that one engine may perform better with your
particular documents than another.
For this reason, most OCR Softwares provide demo downloads to allow you to
choose and try out their software without purchasing it. These free trials
often last for 30-60 days and then you can decide if you want to buy the
product or not.
In my opinion, try different OCR programs and then use their features and
slowly choose the one that fits your needs, because some of these programs
have good marketing and will make buy impulsively. By trying out their
programs you will know if their program is the real deal and if it will suit
your needs.
|