Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes < 1 #12

Open
mromanello opened this issue Sep 7, 2022 · 7 comments
Open

Volumes < 1 #12

mromanello opened this issue Sep 7, 2022 · 7 comments

Comments

@mromanello
Copy link

Hi guys 👋

First off, big kudos to you both for this neat suite of tools to work with HTR-united's data.

I'm preparing a dataset that contains page region annotations (Zones) but not OCR groundtruth.
Screen Shot 2022-09-07 at 17 24 09
The dataset passes the HTRVX and HTR_United_Metadata_Generator validation without issues, but it fails with HTRUC because volume < 1 in characters and lines (because of missing OCR).

My assumption was that it would be possible to add to HTR-united's catalog a dataset containing OLR but not OCR GT data... but perhaps this is not true? 🤔

Any help is appreciated 🙏 cheers!

@PonteIneptique
Copy link
Member

This is a required feature, this should be possible to disable update where numbers are 0, to avoid a HTRUC failure.

As for the dataset, @alix-tz I think we should allow that, but maybe add a new feature such as dataset type ? (not for right now)

@mromanello
Copy link
Author

thanks @PonteIneptique for the quick reply 🙏 So for now we live with it, alles klar! I'll make a PR in any case to add our dataset to the HTR-united catalog asap.

Out of curiosity: is this a validity constraint that can (should?) be relaxed in the JSON schema?

@PonteIneptique
Copy link
Member

That's a good question. I strongly believe we should not allow 0-valued quantities, but I can see an argument for it. It'll depend on @alix-tz feedback.
I personally feel like it's an HTRUC issue more than anything else (ie do not feed 0 values into the quantity key)

@alix-tz
Copy link
Member

alix-tz commented Sep 9, 2022

Ok, I definitely agree that we should allow HTRUC to pass in such cases, alles klar for me too!

@PonteIneptique
Copy link
Member

I hotfixed the schema accordingly, can you check if it works @mromanello

@mromanello
Copy link
Author

mromanello commented Sep 12, 2022

Thanks @PonteIneptique I've just re-run the GH actions, and the red flag is gone! I'll make soon a PR for adding the dataset to the HTR-united catalog. (This issue can be closed for me).

@PonteIneptique
Copy link
Member

I released it as a hotfix, as it should not break anything.
We'll have to see if the HTR-United website does not ignore your data, because I remember at some point putting in a verification for 0 sized datasets...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants