You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is relevant specifically to grc. Because modern books of Ancient Greek often has to mark out uncertain letters in ancient sources, letters with dot below are a common occurrence but are at present not recognised by tesseract.
I wonder if recognising dot below shouldn’t be a feature behind a flag to be manually turned on because it might also pick up stains in older books (which however tend not to have such dots & so don’t require this feature). But this could make it difficult to deploy the feature in downstream projects like Internet Archive.
The text was updated successfully, but these errors were encountered:
This is relevant specifically to grc. Because modern books of Ancient Greek often has to mark out uncertain letters in ancient sources, letters with dot below are a common occurrence but are at present not recognised by tesseract.
A fairly complete list of letters with dot below (except for the lunate sigma ϲ̣) can be found here: https://titus.uni-frankfurt.de/unicode/unicsel/grkkadd.htm
I wonder if recognising dot below shouldn’t be a feature behind a flag to be manually turned on because it might also pick up stains in older books (which however tend not to have such dots & so don’t require this feature). But this could make it difficult to deploy the feature in downstream projects like Internet Archive.
The text was updated successfully, but these errors were encountered: