Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuilt format: Meaning of tokens of length 0 or less #4

Closed
simon-clematide opened this issue Dec 16, 2018 · 1 comment
Closed

Rebuilt format: Meaning of tokens of length 0 or less #4

simon-clematide opened this issue Dec 16, 2018 · 1 comment

Comments

@simon-clematide
Copy link
Contributor

simon-clematide commented Dec 16, 2018

@mromanello There are tokens with length 0 and even with negative values -1.
What does this encode?

e.g. GDL-1799-11-20-a-i0004 {'c': [1203, 2081, 14, 42], 's': 617, 'l': -1}
or GDL-1799-12-22-a-i0003 {'c': [1150, 1031, 28, 33], 's': 918, 'l': 0}

@mromanello
Copy link
Member

The meaning of these fields is as described in the schema.

But the length (l) of a tokens should not have values < 1. From a quick look at the code I believe it's a bug having to do with the hyphenation. I close it here (it does not relate to the schema but to the data) and reopen it there where it belongs.

Thanks for spotting it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants