-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding dataset LiDi 1.0 project #152
Comments
Hello Giorgia, Thank you very much for your contribution! It looks like there are only the XML files in your repository, which is not enough to get a complete GT dataset. I see however that in "sources" you put the link to the image visualizer on the website of the Archivio di stato di Torino. I think it would be useful if you can add, in the README of your dataset repository, clear indications that the images are not included in the dataset but that they can be downloaded there (if they can be?). Basically anything to facilitate the reconstruction of the ground truth dataset. From comparing the viewer and your data, I have the impression that you pre-processed the images to get single pages instead of double pages. This pre-procesing step might be difficult to reproduced in a way that guarantees that the images and the XML files are correctly aligned. If I am right with my understanding, in my opinion, this is reason enough to publish your preprocessed images along with the XML files (if the license on the image allows it). What do you think? Is there anything that can be done in this regard? |
Dear Alix,
I am sorry if I am getting back to you just now.
Thank you for your advice,
Unfortunately, the images can't be downloaded from the digital library of
the Archivio di Stato di Torino.
I will try to get permission to publish the images, that were already
pre-processed by the archive.
I will keep you posted,
Best regards,
*Giorgia Agostini *
Dottorato di ricerca in Storia delle Arti e dello Spettacolo - Digital
Humanities.
Università degli Studi di Firenze (SAGAS).
https://lidiws-limes.cfs.unipi.it/
***@***.***
Il giorno gio 4 lug 2024 alle ore 17:50 Alix Chagué <
***@***.***> ha scritto:
… Hello Giorgia,
Thank you very much for your contribution!
It looks like there are only the XML files in your repository, which is
not enough to get a complete GT dataset. I see however that in "sources"
you put the link to the image visualizer on the website of the Archivio di
stato di Torino. I think it would be useful if you can add, in the README
of your dataset repository, clear indications that the images are not
included in the dataset but that they can be downloaded there (if they can
be?). Basically anything to facilitate the reconstruction of the ground
truth dataset.
From comparing the viewer and your data, I have the impression that you
pre-processed the images to get single pages instead of double pages. This
pre-procesing step might be difficult to reproduced in a way that
guarantees that the images and the XML files are correctly aligned. If I am
right with my understanding, in my opinion, this is reason enough to
publish your preprocessed images along with the XML files (if the license
on the image allows it).
What do you think? Is there anything that can be done in this regard?
—
Reply to this email directly, view it on GitHub
<#152 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A6W5GLXJLFCBDDZQBYXN4QLZKVVM7AVCNFSM6AAAAABKLDMFLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGI3DSMJQGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hello HTR-united team!
please consider the following data set description for inclusion in your directory.
Here is our dataset YAML file:
The text was updated successfully, but these errors were encountered: