OCR acronym means Optical Character Recognition or text recognition. This is a technology which allows to convert different types of document such as scanned papers, PDF files or digital photo, into editable and usable formats.
An OCR system starts from a digital image, produced by an optical scanner of a page (print paper, typewritten sheet etc. ) or by a digital camera, to finally generates a text file in various formats (simple text, word processing format, XML, etc. ). It also works with videos under the same principle.
Text recognition in a picture or a video is an active area of research for computer science since the late 1950s. At the beginning, the problem seemed simple, but it appeared, afterwards, a much more complex subject to deal with.
The first OCR machine has been created by Gustav Tauschek, a German engineer, in 1929. It contained a light-sensitive detector which pointed a light on a word when it matched with a template contained in its memory.
The first systems required a “learning” for reading a given font. But nowadays, it is common to find “smart” systems that are capable to recognize most of the fonts with high accuracy.
Since its first use, OCR technology is still evolving. Now, many optical character recognition software exist and are used in the business world.
Authôt: You speak. We write.