Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add msg.partition as an argument to BlueskyConsumer.process_document #17

Open
gwbischof opened this issue Aug 11, 2020 · 0 comments
Open

Comments

@gwbischof
Copy link
Contributor

gwbischof commented Aug 11, 2020

RunRouter cannot always tell which Run a datum or resource belongs to because resource documents don't always have a run_start uid. When a resource or datum's run can't be identified then it is distributed to all runs.
https://github.com/bluesky/event-model/blob/master/event_model/__init__.py#L1295

I am planning on working on an archiver consumer this week, that will create an archive file for each run. I think that this scenario is a problem because because we don't want documents from one Run, to end up in the archive of a different Run.

We can solve this problem by passing msg.partition to the BlueskyConsumer.process_document.
https://github.com/bluesky/bluesky-kafka/blob/master/bluesky_kafka/__init__.py#L405

The Producer publishes runs to a partition based on a UID. If the consumer subscribes to a topic it will receive documents from multiple partitions. If it passes documents from all partitions directly to a RunRouter, this is equivalent to merging all of the partitions together. The RunRouter cannot perfectly separate the Runs due to missing run_start in the resource documents.

If we pass the msg.partition to process_document we can then keep the partitions isolated, which can solve this problem.

Is it possible to do two BlueskyRuns at the same time and have them end up on the same partition, so that we would need to separate runs that are part of the same stream? If not then we know that each time we see a new start document on a partition a new run has started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant