You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now what if the HiveSink and others that use a LinkedBlockingQueue to service multiple writer threads could do this in a distributed fashion by wrapping the LinkedBlockingQueue interface, i.e. an implementation that wraps a Kafka topic - default one would still remain as threads?
The gotcha is that when you are out-of-process you lose control on how to partition the data into reasonable sizes
However for row oriented storage systems like Kudu and HBase it's perfect - the same usage pattern would even work for the JdbcSink
What do you think?
The text was updated successfully, but these errors were encountered:
Probably not such a great idea as there maybe better approaches.
I understand the multiple thread writers but what if you could flip a switch to share the writes over stateless worker processes, i.e. spin up Kafka/YARN workers
Each YARN worker accepts rows over say Kafka and uses EEL to write
Support Distributed writes with EEL
What do you think?
The text was updated successfully, but these errors were encountered: