Businesses are able to gain the power of receipt scanning automation through OCR Receipt Engine APIs. Yet, before making decisions about integrating new technology into an organisation, it’s helpful to first understand the basis of how it works.
In this article, I will give a brief tour of the fundamentals underpinning Taggun’s receipt OCR engine. Specifically, I will outline 5 important phases to give you some insight into how our receipt scanner API is built. These are (1) OCR Support, (2) Classification, (3) Named Entity Recognition, (4) Specialised Entity Extraction and (5) Data Enrichment.
Receipt OCR and NLP Background
What may be surprising to some, is that the receipt OCR is actually the simplest aspect of the engine, although it does encompass some complexities such as choosing the providers and working through the typical poor print quality of receipts.
The art fuelling Taggun’s engine actually aligns more with Natural Language Processing.
NLP refers to the branch of AI concerned with giving computers the ability to understand text and spoken words in much the same way human beings can (ref. 1). In essence, Taggun transforms syntactical data into semantic information.
Five Crucial Phases of Taggun’s Receipt OCR API Engine
1) OCR support from multiple providers
Once a file is uploaded, Taggun’s OCR receipt scanner sends it to Google Vision or Microsoft Cognitive. Taggun supports multiple OCR providers, in order to pick the best accuracy. The switch to Microsoft Cognitive Service is seamless for customers.
This outputs OCR results; raw text with (X,Y) coordinates.
A contextual awareness is built around the file. This is to enhance the extraction process. Examples include:
- Determine the scope of the amount
- Predict the type/format of file (i.e., invoice, receipt, screenshot, email)
- Predict the language
- Predict the geolocation (near parameter or IP address)
3) Named Entity Recognition
All basic information that is available from the text is extracted, e.g.,
- Locations (city, state, country)
4) Specialised Entities Extraction
Various algorithms and methods to predict the best result for each entity.
E.g., There are 5 different amounts, but which one is the Total Amount? Which one is the Tax Amount? This gets increasingly challenging as the format, content and language of receipts and invoices become extremely diverse. Examples of specialised entities:
- Total Amount
- Tax Amount
- Merchant Verification
- Merchant Name
- Receipt Number
- Invoice Number
- Multi Tax Line Items
- Payment Type (i.e., credit card, cash, visa, MC, etc.)
- Fapiao Invoice Number and Code
For ABN and Merchant Verification (VAT ID), the official sum method is followed to validate each number and improve accuracy.
The more complex entities, like Multi Tax Line Items, require the recognition of patterns in the text, so grouped information can be accurately extracted (such as tax rate, gross tax amount, net tax amount).
Merchant Name Entity can be trained/feedback for each account. So, accuracy can be improved, especially for each individual account over time.
5) Data Enrichment
Other public and useful APIs are opportunistically called to retrieve additional information to serve to our customers. E.g.,
- Fetch additional info for ABN – for Australian customers
- Fetch addition info for VAT ID – for European customers
- Fetch and verify additional information about the location using Google Places
- Normalise merchant names
–> The result that is returned is a JSON file.
Taggun returns results in JSON format, a widely accepted data format that can be easily integrated by developers into any programming language and software. On top of that, Taggun returns the results immediately with each API request. So, developers do not need to make additional requests or build additional webhook endpoints to retrieve the results. We designed and crafted our API carefully, so that it is very easy for your developers to integrate your software with us. Most companies only take 1 week to fully integrate with our API.
These phases are summarised in this diagram
The ability to create and interpret meaningful phrases given a finite set of words and grammatical rules is an enormous advantage for humanity on both evolutionary and individual scales. Linguistic processing was once unique to humans; however, with the recent rise in automation, machine learning has enabled language to be processed in volumes, speed and accuracy greatly exceeding the human capacity.
Now, the ability to utilise machine learning to process language is a necessary advantage in the business world. The automation of receipt and invoice processing provides an example of this, allowing companies to improve operations in expense management, marketing, accounting and IT sectors.
Overall, the OCR software enables the syntactical data from receipts and invoices be to fed deeper into Taggun’s system for analysis. Here, NLP techniques “read” and “interpret” the text to enhance the raw data and transform it into meaningful concepts. It identifies the significant notions in the text and gifts them to humans in a form that’s easy for us to understand and use.
‘The limits of my language are the limits of my world”
– Ludwig Wittgenstein