Skip to content

Commit

Permalink
Merge pull request #2 from andreybicalho/attention-ocr
Browse files Browse the repository at this point in the history
Attention ocr
  • Loading branch information
andreybicalho authored Jun 19, 2020
2 parents a38053b + f260326 commit ff2dbb4
Show file tree
Hide file tree
Showing 28 changed files with 949 additions and 1,459 deletions.
4 changes: 1 addition & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@ data/*
*/data/*
config/*
*/runs/*
*/EMNISTNet/data/*
*/EMNISTNet/custom_dataset/*
*/EMNISTNet/mini_custom_dataset/*
runs/
debug/*
ssigalpr_samples/
#*.jpg
#*.JPG
#*.jpeg
Expand Down
119 changes: 29 additions & 90 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,42 +4,23 @@

# What's this repo about?

This is a simple approach for vehicle registration plate detection and recognition. It is not an end-to-end system, instead, three different methods were stacked together to complete this task. [*YOLO*](https://github.com/pjreddie/darknet) object detection algorithm was used to detect license plate regions, then a marker-based segmentation method using watershed algorithm was applied to extract the character digits. After that, a Convolutional Neural Network (CNN) - *EMNISTNet* - and the "vanilla" [*Tesseract-OCR*](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition (OCR) were used to recognize the extracted digits.
This is a simple approach for vehicle registration plate detection and recognition. It is not an end-to-end system, instead, two different deep learning methods were stacked together to complete this task. [*YOLO*](https://github.com/AlexeyAB/darknet) object detection algorithm was used to detect license plate regions, then an `Attention Based Optical Character Recognition` [*Attention-OCR*](https://github.com/wptoux/attention-ocr) was applied to recognize the characters.

![Output](docs/result.jpg "Output")*Output: vehicle license plate and recognized digits were blurred for an obvious reason.*

Note that it is far from being a perfect solution to this problem. Although YOLO does a great job of finding the license plate regions and character recognition is pretty straight forward nowadays, further improvements could be made. For instance, the character segmentation method used here gives poor results for noisy images, and thus, decreasing OCR accuracy. One could address this issue by applying other image processing algorithms, such as image equalization, morphological operations, among others, to improve image quality and remove as much as possible of the undesired image parts.
![Output](docs/result.jpg "Output")*Results (vehicle license plate and recognized characters were intentionally blurred).*

# Install and Requirements

````
pip install -r requirements.txt
````

## Tesseract-OCR (optional)

If you also want to use *Tesseract-OCR* for the character recognition task, follow the instructions below:

* Tesseract-OCR binaries:
````
sudo apt update sudo apt install tesseract-ocr
````

* Tesseract-OCR Python API:
````
pip install pytesseract==0.3.3
````

## Pre-trained Weights

Download the pre-trained weights for the YOLO and EMNISTNet and put it in the `config` directory.
Download the pre-trained weights for the YOLO and the Attention-OCR and put it in the `config` directory.

* *YOLO* was trained on the Brazilian [SSIG-ALPR](http://smartsenselab.dcc.ufmg.br/en/dataset/banco-de-dados-sense-alpr/) dataset.
* *YOLO* and *Attention-OCR* were trained on the Brazilian [SSIG-ALPR](http://smartsenselab.dcc.ufmg.br/en/dataset/banco-de-dados-sense-alpr/) dataset.
* `TODO:` upload weights and other config files somewhere.

* *EMNISTNet* was trained on the [EMNIST](https://www.nist.gov/itl/products-and-services/emnist-dataset) `bymerge` dataset until it reaches around 89% of accuracy, then training was continued with a custom dataset for fine-tuning. (`TODO:` link the custom dataset).
* `TODO:` upload weights

# Running

Run the application API:
Expand All @@ -58,78 +39,36 @@ curl --location --request POST 'localhost:5000/' \

### API Output:

Although multiple detections and recognitions are possible in the same image, the API will output the prediction for the detection with the highest confidence.
The API will output all the detections with the corresponding bounding boxes and its confidence scores as well as the OCR prediction for each bounding box. Also, we draw all these information on the input image and outputs it as a base64 image.

`json object` response:
`json object` response will look like the following:

````
{
"bounding_box": {
"h": 51,
"w": 127,
"x": 1474,
"y": 520
},
"classId": "0",
"confidence": 1.0,
"emnist_net_preds": "ABC1234",
"tesseract_preds": "ABC1234"
"detections": [
{
"bb_confidence": 0.973590612411499,
"bounding_box": [
1509,
877,
82,
39
],
"ocr_pred": "ABC1234-"
},
{
"bb_confidence": 0.9556514024734497,
"bounding_box": [
161,
866,
100,
40
],
"ocr_pred": "ABC1234-"
}
],
"output_image": "/9j/4AAQS..."
}
````

*Note: If `DEBUG` flag is set to `True` in the `app.py`, images will be produced in the `debug` directory to make debug a bit easier.*

# How To Train

If you want to train the models by yourself, or just want to use your custom datasets, just follow the instructions below:

## YOLO

* You can find [here](https://github.com/AlexeyAB/darknet) very clear instructions on how to train YOLO on your dataset.

## EMNISTNet

Go the EMNISTNet directory and simply type:
````
python train_model.py --e=5 --cuda --v
````

* Params:
* --e=number_of_epochs: the number of epochs you want to train your model
* --cuda: if you want to train on GPU that supports CUDA
* --v: verbose mode

### Fine-tuning on a custom dataset

As we know the EMNIST is a handwritten character digits dataset and the extracted digits of license plates are not handwritten, so EMNISTNet may not give the desired accuracy on these particular images. To circumvent this issue, training was carried out on a custom dataset where digits are more like to our problem domain. `Data Augmentation` methods, such as `rotation` and `shear`, was also applied.

<figure align="center">
<img src="docs/custom_dataset_example.jpg" />
<figcaption>Custom dataset: examples of character digits.</figcaption>
</figure>


````
python train_model.py --m=emnist_model.pt --d=custom_dataset/ --e=10 --cuda --v
````

* Params:
* --m=previous_model.pt: start weights from a pre-trained model and continue training from there
* --d=path_to_the_custom_dataset: path to our custom dataset

*Note: Since pytorch DataLoader keeps its own internal class indexes for the target labels based on the alphabetical order, and image subdirectories are used as class labels, in order to keep track of the `idx` like this:*

`idx = ['0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']`

*I managed to put images in the custom dataset as shown below:*

````
root_image_dir/a/0_image1.png
root_image_dir/a/0_image2.png
root_image_dir/a/0_imageN.png
root_image_dir/ab/1_imageN.png
root_image_dir/abc/2_imageN.png
root_image_dir/.../..._imageN.png
root_image_dir/abcdefghijklmnopqrstuvwxyzabcdefghi/Y_imageN.png
root_image_dir/abcdefghijklmnopqrstuvwxyzabcdefghij/Z_imageN.png
````
6 changes: 4 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,7 @@ Flask==1.1.1
imutils==0.5.3
scikit-image==0.16.2
tensorboard==1.14.0
torch==1.4.0+cpu
torchvision==0.5.0+cpu
torch==1.4.0
torchvision==0.5.0
tqdm==4.46.1
Pillow==7.1.1
60 changes: 0 additions & 60 deletions src/EMNISTNet/exploring_custom_dataset.py

This file was deleted.

71 changes: 0 additions & 71 deletions src/EMNISTNet/exploring_emnist.py

This file was deleted.

Loading

0 comments on commit ff2dbb4

Please sign in to comment.