You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MongoConsumer get documents from a kafka partition and then inserts them into the correct mongo database with suitcase-mongo.Serializer.
Each time a stop document is received the Serializer is closed, and a new Serializer is created for the next set of documents it will receive. If a run does not have a stop document, then the following run will end up using the previous Serializer. So, as is, this Serializer may serialize more than one run. For the suitcase-mongo.Serializer, this shouldn't matter. But for a suitcase for an archival format, we only want one run per file, and this will be a problem.
So I think we should add some code, hopefully to the base class that handles runs with missing stop documents. This could work by checking if a second start document is received before receiving the stop document.
The text was updated successfully, but these errors were encountered:
MongoConsumer get documents from a kafka partition and then inserts them into the correct mongo database with suitcase-mongo.Serializer.
Each time a stop document is received the Serializer is closed, and a new Serializer is created for the next set of documents it will receive. If a run does not have a stop document, then the following run will end up using the previous Serializer. So, as is, this Serializer may serialize more than one run. For the suitcase-mongo.Serializer, this shouldn't matter. But for a suitcase for an archival format, we only want one run per file, and this will be a problem.
So I think we should add some code, hopefully to the base class that handles runs with missing stop documents. This could work by checking if a second start document is received before receiving the stop document.
The text was updated successfully, but these errors were encountered: