Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BNL - Lux importer] Investigate and fix the logical matching of physical articles and content-items #130

Open
piconti opened this issue May 2, 2024 · 1 comment
Assignees

Comments

@piconti
Copy link
Member

piconti commented May 2, 2024

As is described with patch #4 in this google sheet, there is a problem with the logic separating the identified regions into content-items in the BNL data.

Given that the content-items are created and remerged in the importer's code, some problems most probably stem from there.
The goal is thus to re-think the implementation of the logical construction of content items in the BNL importer.

Note that this will most probably break current content-item IDs for all concerned BNL data.

Ideally, this is also done when ingesting the updated OCR for the BNL data.

@piconti piconti self-assigned this May 2, 2024
@piconti
Copy link
Member Author

piconti commented Jun 5, 2024

New information on this issue:
It has been found that the version of the data visible on the interface is actually not the version of the data in the S3 of the last release documented, but a prior version.

It may be that the problem with the BNL article reconstrution had indeed been fixed and that it would not be necessary to fix it.
However, this probably means that there will be a break in the content-item Ids

@e-maud e-maud self-assigned this Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants