Athento-Imaging is a Python library developed using OpenCV to improve OCR in documents. Among the documents tested using this library are: passports, bills, delivery notes, budgets, and other common documents.
You can check everything out in the Athento-Imaging Summary.
The quality of the output and it's OCR performance will depend on:
- The quality of the source document, as the quality value increases so does the OCR.
- The amount of noise in the document and it's position.
- The presence and position of the document's watermarks.
- The colour of the document. Clear colours are easier to remove than darker colours due to the proximity of the pixel values between the background and the text.
- Your personal experience in image transformation, as you might need to perform a combination of operations or change the parameters values significantly.