Merge pull request #2 from andreybicalho/attention-ocr

Attention ocr
andreybicalho · Jun 19, 2020 · ff2dbb4 · ff2dbb4
2 parents a38053b + f260326
commit ff2dbb4
Show file tree

Hide file tree

Showing 28 changed files with 949 additions and 1,459 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,11 +7,9 @@ data/*
 */data/*
 config/*
 */runs/*
-*/EMNISTNet/data/*
-*/EMNISTNet/custom_dataset/*
-*/EMNISTNet/mini_custom_dataset/*
 runs/
 debug/*
+ssigalpr_samples/
 #*.jpg
 #*.JPG
 #*.jpeg

diff --git a/README.md b/README.md
@@ -4,42 +4,23 @@
 
 # What's this repo about?
 
-This is a simple approach for vehicle registration plate detection and recognition. It is not an end-to-end system, instead, three different methods were stacked together to complete this task. [*YOLO*](https://github.com/pjreddie/darknet) object detection algorithm was used to detect license plate regions, then a marker-based segmentation method using watershed algorithm was applied to extract the character digits. After that, a Convolutional Neural Network (CNN) - *EMNISTNet* - and the "vanilla" [*Tesseract-OCR*](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition (OCR) were used to recognize the extracted digits.
+This is a simple approach for vehicle registration plate detection and recognition. It is not an end-to-end system, instead, two different deep learning methods were stacked together to complete this task. [*YOLO*](https://github.com/AlexeyAB/darknet) object detection algorithm was used to detect license plate regions, then an `Attention Based Optical Character Recognition` [*Attention-OCR*](https://github.com/wptoux/attention-ocr) was applied to recognize the characters.
 
-![Output](docs/result.jpg "Output")*Output: vehicle license plate and recognized digits were blurred for an obvious reason.*
-
-Note that it is far from being a perfect solution to this problem. Although YOLO does a great job of finding the license plate regions and character recognition is pretty straight forward nowadays, further improvements could be made. For instance, the character segmentation method used here gives poor results for noisy images, and thus, decreasing OCR accuracy. One could address this issue by applying other image processing algorithms, such as image equalization, morphological operations, among others, to improve image quality and remove as much as possible of the undesired image parts.
+![Output](docs/result.jpg "Output")*Results (vehicle license plate and recognized characters were intentionally blurred).*
 
 # Install and Requirements
 
 ````
 pip install -r requirements.txt
 ````
 
-## Tesseract-OCR (optional)
-
-If you also want to use *Tesseract-OCR* for the character recognition task, follow the instructions below:
-
-* Tesseract-OCR binaries:
-````
-sudo apt update sudo apt install tesseract-ocr
-````
-
-* Tesseract-OCR Python API:
-````
-pip install pytesseract==0.3.3
-````
-
 ## Pre-trained Weights
 
-Download the pre-trained weights for the YOLO and EMNISTNet and put it in the `config` directory.
+Download the pre-trained weights for the YOLO and the Attention-OCR and put it in the `config` directory.
 
-* *YOLO* was trained on the Brazilian [SSIG-ALPR](http://smartsenselab.dcc.ufmg.br/en/dataset/banco-de-dados-sense-alpr/) dataset.
+* *YOLO* and *Attention-OCR* were trained on the Brazilian [SSIG-ALPR](http://smartsenselab.dcc.ufmg.br/en/dataset/banco-de-dados-sense-alpr/) dataset.
   * `TODO:` upload weights and other config files somewhere.
 
-* *EMNISTNet* was trained on the [EMNIST](https://www.nist.gov/itl/products-and-services/emnist-dataset) `bymerge` dataset until it reaches around 89% of accuracy, then training was continued with a custom dataset for fine-tuning. (`TODO:` link the custom dataset).
-  * `TODO:` upload weights
-
 # Running
 
 Run the application API:
@@ -58,78 +39,36 @@ curl --location --request POST 'localhost:5000/' \
 
 ### API Output:
 
-Although multiple detections and recognitions are possible in the same image, the API will output the prediction for the detection with the highest confidence. 
+The API will output all the detections with the corresponding bounding boxes and its confidence scores as well as the OCR prediction for each bounding box. Also, we draw all these information on the input image and outputs it as a base64 image.
 
-`json object` response:
+`json object` response will look like the following:
 
 ````
 {
-  "bounding_box": {
-    "h": 51,
-    "w": 127,
-    "x": 1474,
-    "y": 520
-  },
-  "classId": "0",
-  "confidence": 1.0,
-  "emnist_net_preds": "ABC1234",
-  "tesseract_preds": "ABC1234"
+  "detections": [
+    {
+      "bb_confidence": 0.973590612411499,
+      "bounding_box": [
+        1509,
+        877,
+        82,
+        39
+      ],
+      "ocr_pred": "ABC1234-"
+    },
+    {
+      "bb_confidence": 0.9556514024734497,
+      "bounding_box": [
+        161,
+        866,
+        100,
+        40
+      ],
+      "ocr_pred": "ABC1234-"
+    }
+  ],
+  "output_image": "/9j/4AAQS..."
 }
 ````
 
 *Note: If `DEBUG` flag is set to `True` in the `app.py`, images will be produced in the `debug` directory to make debug a bit easier.*
-
-# How To Train
-
-If you want to train the models by yourself, or just want to use your custom datasets, just follow the instructions below:
-
-## YOLO
-
-* You can find [here](https://github.com/AlexeyAB/darknet) very clear instructions on how to train YOLO on your dataset.
-
-## EMNISTNet
-
-Go the EMNISTNet directory and simply type:
-````
-python train_model.py --e=5 --cuda --v
-````
-
-* Params: 
-  * --e=number_of_epochs: the number of epochs you want to train your model
-  * --cuda: if you want to train on GPU that supports CUDA
-  * --v: verbose mode
-
-### Fine-tuning on a custom dataset
-
-As we know the EMNIST is a handwritten character digits dataset and the extracted digits of license plates are not handwritten, so EMNISTNet may not give the desired accuracy on these particular images. To circumvent this issue, training was carried out on a custom dataset where digits are more like to our problem domain. `Data Augmentation` methods, such as `rotation` and `shear`, was also applied.
-
-<figure align="center">
-    <img src="docs/custom_dataset_example.jpg" />
-    <figcaption>Custom dataset: examples of character digits.</figcaption>
-</figure>
-
-
-````
-python train_model.py --m=emnist_model.pt --d=custom_dataset/ --e=10 --cuda --v
-````
-
-* Params: 
-  * --m=previous_model.pt: start weights from a pre-trained model and continue training from there
-  * --d=path_to_the_custom_dataset: path to our custom dataset
-
-*Note: Since pytorch DataLoader keeps its own internal class indexes for the target labels based on the alphabetical order, and image subdirectories are used as class labels, in order to keep track of the `idx` like this:*
-
-`idx = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']`
-
-*I managed to put images in the custom dataset as shown below:*
-
-````
-root_image_dir/a/0_image1.png
-root_image_dir/a/0_image2.png
-root_image_dir/a/0_imageN.png
-root_image_dir/ab/1_imageN.png
-root_image_dir/abc/2_imageN.png
-root_image_dir/.../..._imageN.png
-root_image_dir/abcdefghijklmnopqrstuvwxyzabcdefghi/Y_imageN.png
-root_image_dir/abcdefghijklmnopqrstuvwxyzabcdefghij/Z_imageN.png
-````
diff --git a/requirements.txt b/requirements.txt
@@ -15,5 +15,7 @@ Flask==1.1.1
 imutils==0.5.3
 scikit-image==0.16.2
 tensorboard==1.14.0
-torch==1.4.0+cpu 
-torchvision==0.5.0+cpu
+torch==1.4.0
+torchvision==0.5.0
+tqdm==4.46.1
+Pillow==7.1.1
diff --git a/src/EMNISTNet/exploring_custom_dataset.py b/src/EMNISTNet/exploring_custom_dataset.py
diff --git a/src/EMNISTNet/exploring_emnist.py b/src/EMNISTNet/exploring_emnist.py