Building an intelligent character recognition solution

Building a cloud-native intelligent character recognition solution

We ran a test and found that the Amazon Textract service is a well-rounded intelligent character recognition (ICR), but it requires a lot of custom code to implement an overall solution to meet common business needs.

Published at

9 September 2021

Background

In computer science, intelligent character recognition (ICR) is an advanced optical character recognition (OCR) or, more specifically, a handwriting recognition system that allows fonts and different handwriting styles to be learned by a computer during processing to improve accuracy and recognition levels.

ICR and OCR technology have been adopted by businesses to effectively deal with whitemail and other physical data which is inaccessible to computer systems. These documents include letters, forms, labels, tickets, and more.

Whilst digital transformation channels have helped to decrease the volume of whitemail that a business needs to handle, they still have a requirement to support the physical format. This process can prove to be labour-intensive, slow and error-prone.

Opportunity

At the AWS re:Invent Keynote in 2018, Andy Jassy (CEO of AWS), announced the launch of Amazon Textract. Amazon Textract is a machine learning service that automatically extracts text, handwriting and data from scanned documents that identify, understand, and extract data from forms and tables. This announcement indicated a huge step-change for many organisations, both in the private and public sectors, in dealing with the challenges of ICR.

We now have public cloud storage such as Amazon Simple Storage Service and Azure Blob Storage, and also Artificial Intelligence services like Amazon Textract and Azure Form Recogniser. This combination provides a challenge to automate whitemail processing, resulting in near real-time data capture.

Whilst these services were impressive in our testing, they are not faultless for the capture of handwritten text as standalone offerings. Pre- and post-processing can, however, dramatically improve the accuracy of these native services.

Once accuracy is within tolerable levels, the solution can then provide genuine cost reduction through repurposing existing staff to higher-value tasks.

Observations

Pre- and post-processing should be delivered using a serverless architecture with AWS Lambda and Azure Functions. This architecture keeps the overall solution cloud-native, fully managed, highly available, secure and scalable - only paying for the solution while using it. Many organisations where whitemail services have recently grown to a national scale have been affected by the pandemic. If this is the case, a serverless architecture should be a serious consideration.

Pre-processing

The results found that up to a 5% improvement in accuracy can be observed by merely converting any graphic file from colour to monochrome (which also has the benefit of reducing storage costs) and improving the file's sharpness. Online tools can be used, including the recently announced AWS Serverless Image Handler.

Post-processing

Once the ICR service has processed the file, further improvements can and should be made to the quality of the output to increase accuracy. Some of the value-added processing used to optimise the native offerings include:

Field Class Constraints

This enhancement lets you make safe assumptions for a field where you expect to see specific character formatting.

Lookup Validation

Where data sets and integration services exist, these can validate the output.

Only the beginning

Like data captured through other digital transformation channels, this is only the beginning of the journey; the goal should never be to solely migrate the data from one format to another. Consider the value of the data through enrichment or analytical services, or by integrating disparate systems.