Click to learn more about author Jasmine Morgan.
Document processing is a necessity, yet it consumes so much time that every large organization is striving to find ways to automate it. However, any solution to be feasible needs to be as accurate as possible, especially in the case of financial documents.
Old-school OCR
Until AI became an option, simple optical character recognition (OCR) could not operate properly unless strict rules and templates were in place. These tools did not process context, had no feedback and self-regulating mechanisms, therefore manual supervision was still necessary for those using it.
This was done by first asking the user to track the area where the program should perform the task. This way of dealing with documents was a bit better than inputting data by hand, but it was far from a fully-automated and independent solution.
OCR works acceptably when it deals with documents in an excellent visual format and those following the templates loaded into the OCR system. The problem with this approach is that for every type of document a new model has to be created and uploaded.
Sometimes creating such templates can be as time-consuming or as costly as manual data input. Another downside is that template-based OCR has no flexibility regarding document formats. Any slight variation requires a new template with new rules, even for very similar documents like invoices created by different vendors.
How Can AI Help OCR?
AI has the advantage of recognizing patterns much in the same way a human brain does it. Also, it can use retrieved information to take decisions. This is different from traditional OCR because it means that AI not only looks at individual letter contours in an attempt to guess the signs. Instead, after performing this initial step, AI can do much more, like checking dictionaries for words and looking at the context to make sure that the selected combination matches the surrounding information.
This feedback loop is easier to implement in the case of words, but it can also follow the rules for checking numbers and other financial information.
AI’s Modus Operandi
The building blocks of AI are neural networks, a concept dating back to the 1940s and representing a series of interconnected algorithms. To function appropriately, these need to be trained using labeled variations of the data they need to recognize.
Once the algorithm begins to understand how information is structured and to identify the components of a document, it can withstand small variations. For example, it can learn that the total amount of an invoice is listed at the end of the paper, but if the invoice has multiple pages it will look for the specific words “total amount.”
The main challenge of AI for OCR is the lack of qualified personnel and the cost of its implementation. Most large organizations are very well aware of the need to implement AI for data processing tasks where they would traditionally use OCR, but they are trapped in a legacy logic.
There is currently a slow drive towards applying AI because, although the benefits are obvious, there are still too few success stories to generate momentum.
Building an OCR Engine
Some companies choose to repurpose old OCR algorithms. Creating an AI-powered, efficient OCR engine is something that scientists from InData Labs have already done and described in detail on their blog in six steps.
It all starts with image acquisition, usually transforming any document into a black and white version that is easier to parse in subsequent steps.
Most of the times this is not enough, and the image will still have some residual noise that can make reading harder and results less accurate. This step is especially important in the case of handwritten documents that have significant variability. A way to integrate AI into this step is to use computer vision.
Once the input is clean, it is time to break it into atomic pieces which can be analyzed and categorized into predefined bins, or use clustering algorithms to find the most appropriate classification.
The next step means looking at individual characteristics to identify the differences and decide upon the final classification of each character. This is the moment when the real power of the neural network can manifest. By looking at thousands of small variations, like different fonts, sizes, or weights, the system becomes able to recognize symbols much like a person would do it while reading.
The last step is concerned with checking the result, refining, or correcting it. Until now, there is no technical solution to make it 100% automatic. It is still necessary to have a human making the final corrections.
Problems and Features
As it can be implied from this workflow, the main issues arise from the quality of the input material and the sensitivity of the algorithm.
A useful AI tool supports a wide variety of input methods and file types. It should at least be able to read the most common image formats, PDFs, emails, spreadsheet, and more.
Once extracted, the information needs to be convertible to any necessary format and ready to be imported in various platforms (such as ERP and CMS). Last but not least, it should be easily scalable to accommodate any data volume.
When Is OCR a Better Option?
The narrative so far does not intend to position AI as a panacea but as a natural evolution and alternative to conventional OCR. However, there are some use cases when the plain OCR is enough. These include barcode scanning and any fixed formats, like QR codes. Even some invoice formats work perfectly without neural networks.
However, if you have a few variations in your input, creating templates for OCR models becomes costlier than training an AI.