-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR recognition model #158
base: main
Are you sure you want to change the base?
Conversation
…feat/ocr-recognition
Possible improvements include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, left some comments. One thing that we want to also make sure is the integration with HubAI and depthai-nodes - the archived model should have correct archive data so that the parser can parse it.
target = [chr(int(char.item())) for char in target] | ||
target = "".join(target) | ||
target_strings.append(target) | ||
print(target_strings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed (also another print in the forward())
|
||
## `OCRRecognitionModel` | ||
|
||
FPS of the `OCRRecognitionModel` on different devices with image size 48x320: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this under ### Performance Metrics section to be inline with others. And before that add just a very short description, for this model it would be valuable to note how the dataset needs to be structured - which annotations need to be present
class OCRRecognitionModel(BasePredefinedModel): | ||
"""A predefined model for OCR recognition tasks.""" | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a variant even if it is just for "light" for now as it simpler then to integrate on the HubAI side (same paramter as all other predefined models)
return kernel * t, beta - running_mean * gamma / std # type: ignore | ||
|
||
|
||
class SELayer(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the same block as SqueezeExciteBlock with approx_sigmoid=True
. If so we can remove it here and replace with that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I didn't notice that they were the same.
logger = logging.getLogger(__name__) | ||
|
||
|
||
NET_CONFIG_det = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we can put this into variants.py (similarly as we do with e.g. EfficientRep) so the code is a bit more clean
from luxonis_train.utils import OCRDecoder, OCREncoder | ||
|
||
|
||
def get_para_bias_attr(l2_decay: float, k: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Move this to the bottom so class is on top of the file, IMO cleaner
class OCRCTCHead(BaseHead[Tensor, Tensor]): | ||
in_channels: int | ||
tasks: list[TaskType] = [TaskType.CLASSIFICATION] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the exported model look the same as PaddleOCR? Can we use the same parser for it from depthai-nodes (this one)? Ideally we want to support the full integration with HubAI
return x | ||
|
||
|
||
class Block(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Can we name it something more descriptive?
New OCR recognition model, loss, metric and visualizer
The most important changes are summarized below:
Losses:
CTCLoss
with optional focal loss weighting inluxonis_train/attached_modules/losses/ctc_loss.py
and updated__init__.py
to includeCTCLoss
. [1] [2] [3]luxonis_train/attached_modules/losses/README.md
to documentCTCLoss
.Metrics:
OCRAccuracy
metric for OCR tasks inluxonis_train/attached_modules/metrics/ocr_accuracy.py
and updated__init__.py
to includeOCRAccuracy
. [1] [2] [3]luxonis_train/attached_modules/metrics/README.md
to documentOCRAccuracy
.Visualizers:
OCRVisualizer
for visualizing OCR tasks inluxonis_train/attached_modules/visualizers/ocr_visualizer.py
and updated__init__.py
to includeOCRVisualizer
. [1] [2] [3]luxonis_train/attached_modules/visualizers/README.md
to documentOCRVisualizer
.Predefined Models:
OCRRecognitionModel
toluxonis_train/config/predefined_models/__init__.py
and updatedREADME.md
to document its components and parameters. [1] [2] [3]Toy dataset creation example
Examples from the overfitted model on the toy dataset