-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's start thinking about how to document models #16
Comments
I think the software should be one of the first thing to appear, because if I'm using Transkribus, I won't care that model X or Y are able to handle French if they are Kraken models. Now that raises an important question: given that Transkribus already provides a page listing public transcription models (https://readcoop.eu/transkribus/public-models/), do we want to also cover Transkribus models? Personnally, I would lean in favor of it1, but it makes things a little more complicated: for example License, Ecoding and DOI2 might be impossible to fill for Transkribus models. Footnotes |
Sorry for only starting to participate now. Something that is rather important is a field that indicates the type of model, e.g. transcription, segmentation, reading order, ... in addition to the software so it is possible to filter according to what one is actually looking for without having to download individual models. That would probably require changing the semantics of the As @PonteIneptique correctly identified models are somewhat ephemeral. In my opinion we should at least provide guidelines on how to deal with that. One (not particularly well thought out) way could be to treat the record/DOI as a 'prototype' model for that dataset(s) for a particular software and publish replacement models, e.g. a tweaked architecture improving performance, as a version linked to that original model instead of creating a completely new record. This is primarily to reduce the noise level in any model repository but might have some other benefits as well such as incentivizing early publication of models. |
Ah your comment reminds me that we should probably include a "date of creation" property! |
Hello to All, unfortunately I could not participate in the discussion. I would now like to continue the discussion.
Both schemas are strongly related to each other in terms of content but have special features. It can be stated, the schema for GT is currently stable. My proposal for the description of metadata for a model was always based on the GT. Now, of course, there are other scenarios:
In the first case there should be a connection between model and GT.
I have expressed this now first everything naturally linguistically, since I assume that the formal writing can be realized so more simply then. |
See: HTR-United/htr-united#91
an example provided by @tboenig : https://tboenig.github.io/gt-metadata/document-your-gt.html (it ties the description of the model to the description of the dataset)
a proposition from @PonteIneptique :
The text was updated successfully, but these errors were encountered: