OCR recognition model #158

sokovninn · 2025-01-23T21:36:47Z

New OCR recognition model, loss, metric and visualizer

The most important changes are summarized below:

Losses:

Introduced CTCLoss with optional focal loss weighting in luxonis_train/attached_modules/losses/ctc_loss.py and updated __init__.py to include CTCLoss. [1] [2] [3]
Updated luxonis_train/attached_modules/losses/README.md to document CTCLoss.

Metrics:

Added OCRAccuracy metric for OCR tasks in luxonis_train/attached_modules/metrics/ocr_accuracy.py and updated __init__.py to include OCRAccuracy. [1] [2] [3]
Updated luxonis_train/attached_modules/metrics/README.md to document OCRAccuracy.

Visualizers:

Introduced OCRVisualizer for visualizing OCR tasks in luxonis_train/attached_modules/visualizers/ocr_visualizer.py and updated __init__.py to include OCRVisualizer. [1] [2] [3]
Updated luxonis_train/attached_modules/visualizers/README.md to document OCRVisualizer.

Predefined Models:

Added OCRRecognitionModel to luxonis_train/config/predefined_models/__init__.py and updated README.md to document its components and parameters. [1] [2] [3]

Toy dataset creation example

def toy_ocr_generator():
    im_paths = glob.glob("*.png")
    labels = [os.path.splitext(os.path.basename(path))[0] for path in im_paths]
    for path, label in tqdm(zip(im_paths, labels)):
        if len(label):
            yield {
                "file": path,
                "annotation": {
                    "metadata": {"text": label, "text_length": len(label)},
                },
            }

Examples from the overfitted model on the toy dataset

…feat/ocr-recognition

sokovninn · 2025-01-23T22:38:23Z

Possible improvements include:

Adding more advanced OCR metrics
Adding a temporal NRTR head together with NRTRLoss
Improving visualization
Adding a large variant
Improving encoder to handle more edge cases
Adding a beam search decoder
Adding OCR-specific augmentations
Adding OCR detection model (same backbone)

klemen1999

Generally LGTM, left some comments. One thing that we want to also make sure is the integration with HubAI and depthai-nodes - the archived model should have correct archive data so that the parser can parse it.

klemen1999 · 2025-01-24T12:15:09Z

luxonis_train/attached_modules/visualizers/ocr_visualizer.py

+            target = [chr(int(char.item())) for char in target]
+            target = "".join(target)
+            target_strings.append(target)
+        print(target_strings)


Probably not needed (also another print in the forward())

klemen1999 · 2025-01-24T12:17:53Z

luxonis_train/config/predefined_models/README.md

+
+## `OCRRecognitionModel`
+
+FPS of the `OCRRecognitionModel` on different devices with image size 48x320:


Let's add this under ### Performance Metrics section to be inline with others. And before that add just a very short description, for this model it would be valuable to note how the dataset needs to be structured - which annotations need to be present

klemen1999 · 2025-01-24T12:19:03Z

luxonis_train/config/predefined_models/ocr_recognition_model.py

+class OCRRecognitionModel(BasePredefinedModel):
+    """A predefined model for OCR recognition tasks."""
+
+    def __init__(


Let's add a variant even if it is just for "light" for now as it simpler then to integrate on the HubAI side (same paramter as all other predefined models)

klemen1999 · 2025-01-24T12:24:48Z

luxonis_train/nodes/backbones/pplcnet_v3/blocks.py

+        return kernel * t, beta - running_mean * gamma / std  # type: ignore
+
+
+class SELayer(nn.Module):


I believe this is the same block as SqueezeExciteBlock with approx_sigmoid=True. If so we can remove it here and replace with that one?

Good point! I didn't notice that they were the same.

klemen1999 · 2025-01-24T12:26:39Z

luxonis_train/nodes/backbones/pplcnet_v3/pplcnet_v3.py

+logger = logging.getLogger(__name__)
+
+
+NET_CONFIG_det = {


IMO we can put this into variants.py (similarly as we do with e.g. EfficientRep) so the code is a bit more clean

klemen1999 · 2025-01-24T12:28:24Z

luxonis_train/nodes/heads/ocr_ctc_head.py

+from luxonis_train.utils import OCRDecoder, OCREncoder
+
+
+def get_para_bias_attr(l2_decay: float, k: int):


Nitpick: Move this to the bottom so class is on top of the file, IMO cleaner

klemen1999 · 2025-01-24T12:30:37Z

luxonis_train/nodes/heads/ocr_ctc_head.py

+class OCRCTCHead(BaseHead[Tensor, Tensor]):
+    in_channels: int
+    tasks: list[TaskType] = [TaskType.CLASSIFICATION]
+


Does the exported model look the same as PaddleOCR? Can we use the same parser for it from depthai-nodes (this one)? Ideally we want to support the full integration with HubAI

klemen1999 · 2025-01-24T12:32:10Z

luxonis_train/nodes/necks/svtr_neck/blocks.py

+        return x
+
+
+class Block(nn.Module):


Nitpick: Can we name it something more descriptive?

sokovninn added 4 commits January 23, 2025 20:52

fix: replace slashes in inference image paths

85d1fbc

feat: add OCR recognition model

81e176e

test: simple ocr test

73c412b

fix: collate_fn handling of string lists

4ac2a30

sokovninn requested a review from a team as a code owner January 23, 2025 21:36

sokovninn requested review from kozlov721, klemen1999, tersekmatija and conorsim and removed request for a team January 23, 2025 21:36

sokovninn added 2 commits January 23, 2025 21:40

Merge branch 'main' of https://github.com/luxonis/luxonis-train into …

38091a9

…feat/ocr-recognition

style: formatting

e78a79a

github-actions bot assigned sokovninn Jan 23, 2025

github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 23, 2025

klemen1999 reviewed Jan 24, 2025

View reviewed changes

fix: custom head config params

f89908c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR recognition model #158

OCR recognition model #158

sokovninn commented Jan 23, 2025 •

edited

Loading

sokovninn commented Jan 23, 2025 •

edited

Loading

klemen1999 left a comment

klemen1999 Jan 24, 2025

klemen1999 Jan 24, 2025

klemen1999 Jan 24, 2025

klemen1999 Jan 24, 2025

sokovninn Jan 24, 2025

klemen1999 Jan 24, 2025

klemen1999 Jan 24, 2025

klemen1999 Jan 24, 2025

klemen1999 Jan 24, 2025


		## `OCRRecognitionModel`

		FPS of the `OCRRecognitionModel` on different devices with image size 48x320:

		return kernel * t, beta - running_mean * gamma / std # type: ignore


		class SELayer(nn.Module):

		from luxonis_train.utils import OCRDecoder, OCREncoder


		def get_para_bias_attr(l2_decay: float, k: int):

		logger = logging.getLogger(__name__)


		NET_CONFIG_det = {

OCR recognition model #158

Are you sure you want to change the base?

OCR recognition model #158

Conversation

sokovninn commented Jan 23, 2025 • edited Loading

New OCR recognition model, loss, metric and visualizer

Losses:

Metrics:

Visualizers:

Predefined Models:

Toy dataset creation example

Examples from the overfitted model on the toy dataset

sokovninn commented Jan 23, 2025 • edited Loading

klemen1999 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sokovninn commented Jan 23, 2025 •

edited

Loading

sokovninn commented Jan 23, 2025 •

edited

Loading