How to extract account number in cheque/check images


I am working on a task to extract the account number from cheque images. My current approach can be divided into 2 steps

  1. Localize account number digits (Printed digits)
  2. Perform OCR using OCR libraries like Tesseract OCR

The second step is straight forward assuming we have properly localized the account number digits

I tried to localize account number digits using OpenCV contours methods and using MSER (Maximally stable extremal regions) but didn’t get useful results. It’s difficult to generalize pattern because

  • Different bank cheques have variations in template
  • Account number position is not fixed

How can we approach this problem. Do I have to look for some deep learning based approaches.

Sample Images enter image description here


Answers:


Assuming the account number has the unique purple text color, we can use color thresholding. The idea is to convert the image to HSV color space then define a lower/upper color range and perform color thresholding using cv2.inRange(). From here we filter by contour area to remove small noise. Finally we invert the image since we want the text in black with the background in white. One last step is to Gaussian blur the image before throwing it into Pytesseract. Here's the result:

enter image description here

Result from Pytesseract

30002010108841

Code

import numpy as np
import pytesseract
import cv2

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([103,79,60])
upper = np.array([129,255,255])
mask = cv2.inRange(hsv, lower, upper)

cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 10:
        cv2.drawContours(mask, [c], -1, (0,0,0), -1)

mask = 255 - mask
mask = cv2.GaussianBlur(mask, (3,3), 0)

data = pytesseract.image_to_string(mask, lang='eng',config='--psm 6')
print(data)

cv2.imshow('mask', mask)
cv2.waitKey()