Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractAggregator: Aggregation operations are no longer transparent downstream #102

Closed
ppanopticon opened this issue Aug 26, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@ppanopticon
Copy link
Member

ppanopticon commented Aug 26, 2024

@rahelarnold98 and I just ran into an issue in the XReco context. Due to the recent changes to the AbstractAggregator, the "aggregated" Retrievable no longer holds only aggregated content. Instead, it holds the original content plus the aggregated content in a single Retrievable .

The question now is: By what mechanism can we restrict downstream operators to only work with the aggregated content? It's one thing if an operator is built from scratch. In our case, however, we mostly rely on existing operators that are being configured.

Quite frankly, this change broke our complete video extraction pipeline.

@ppanopticon ppanopticon added the bug Something isn't working label Aug 26, 2024
@ppanopticon ppanopticon added this to the Release Candidate #1 milestone Aug 26, 2024
@ppanopticon ppanopticon changed the title AbstractAggregator: Aggregation Operations no longer transparent downstream AbstractAggregator: Aggregation operations are no longer transparent downstream Aug 26, 2024
@lucaro
Copy link
Member

lucaro commented Aug 26, 2024

That is the downside of the append-only approach. We do have a mechanism to check the author of a content element via the ContentAuthorAttribute, so at least they are distinguishable.

@ppanopticon
Copy link
Member Author

While this may very well be, I currently don't see a way to leverage this in a configurable fashion (i.e., without changing all the operators). Or have I overlooked something?

@faberf
Copy link
Contributor

faberf commented Aug 26, 2024

Hey @ppanopticon sorry for just now getting to this. Yes, with the approach we are going with all of the downstream consumers must filter the content using the contentauthorattribute using a configurable value in the extraction configuration. You can check the FES Extractor class for an example.

@ppanopticon
Copy link
Member Author

Okay I see - thanks for the hint.

However, I would expect the author(s) of such a breaking change to actually adjust existing operators such that they can work as they did before opening a PR. In its current state, the change breaks the pipeline.

@faberf
Copy link
Contributor

faberf commented Aug 26, 2024

Yes, I agree, I actually hadn't considered that this change is breaking before. I will push a fix to dev today, and will edit the wiki.

@ppanopticon
Copy link
Member Author

Fixed by PR #103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants
@lucaro @ppanopticon @faberf and others