[BNL - Lux importer] Investigate and fix the logical matching of physical articles and content-items #130

piconti · 2024-05-02T10:34:53Z

As is described with patch #4 in this google sheet, there is a problem with the logic separating the identified regions into content-items in the BNL data.

Given that the content-items are created and remerged in the importer's code, some problems most probably stem from there.
The goal is thus to re-think the implementation of the logical construction of content items in the BNL importer.

Note that this will most probably break current content-item IDs for all concerned BNL data.

Ideally, this is also done when ingesting the updated OCR for the BNL data.

piconti · 2024-06-05T15:35:23Z

New information on this issue:
It has been found that the version of the data visible on the interface is actually not the version of the data in the S3 of the last release documented, but a prior version.

It may be that the problem with the BNL article reconstrution had indeed been fixed and that it would not be necessary to fix it.
However, this probably means that there will be a break in the content-item Ids

piconti self-assigned this May 2, 2024

e-maud self-assigned this Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BNL - Lux importer] Investigate and fix the logical matching of physical articles and content-items #130

[BNL - Lux importer] Investigate and fix the logical matching of physical articles and content-items #130

piconti commented May 2, 2024 •

edited

Loading

piconti commented Jun 5, 2024

[BNL - Lux importer] Investigate and fix the logical matching of physical articles and content-items #130

[BNL - Lux importer] Investigate and fix the logical matching of physical articles and content-items #130

Comments

piconti commented May 2, 2024 • edited Loading

piconti commented Jun 5, 2024

piconti commented May 2, 2024 •

edited

Loading