Support Distributed writes with EEL #253

hannesmiller · 2017-02-22T20:55:29Z

Support Distributed writes with EEL

N writers via JdbcSource -> KafkaSink
N Writers via HiveSink/KuduSink/HBaseSink
Now what if the HiveSink and others that use a LinkedBlockingQueue to service multiple writer threads could do this in a distributed fashion by wrapping the LinkedBlockingQueue interface, i.e. an implementation that wraps a Kafka topic - default one would still remain as threads?
The gotcha is that when you are out-of-process you lose control on how to partition the data into reasonable sizes
However for row oriented storage systems like Kudu and HBase it's perfect - the same usage pattern would even work for the JdbcSink

What do you think?

sksamuel · 2017-02-23T02:23:09Z

I don't understand sorry. At the moment HiveSink has multiple writers, through multiple threads. Are you meaning something else?

hannesmiller · 2017-02-23T09:36:38Z

Probably not such a great idea as there maybe better approaches.

I understand the multiple thread writers but what if you could flip a switch to share the writes over stateless worker processes, i.e. spin up Kafka/YARN workers

Each YARN worker accepts rows over say Kafka and uses EEL to write

sksamuel · 2017-02-23T10:41:55Z

You can, but its then no longer a standalone in process API and turning more into spark.

sksamuel added the question label Jul 14, 2017

garyfrost added this to the 1.4 milestone Feb 5, 2018

garyfrost added enhancement and removed question labels Feb 5, 2018

garyfrost assigned hannesmiller Feb 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Distributed writes with EEL #253

Support Distributed writes with EEL #253

hannesmiller commented Feb 22, 2017

sksamuel commented Feb 23, 2017

hannesmiller commented Feb 23, 2017

sksamuel commented Feb 23, 2017

Support Distributed writes with EEL #253

Support Distributed writes with EEL #253

Comments

hannesmiller commented Feb 22, 2017

sksamuel commented Feb 23, 2017

hannesmiller commented Feb 23, 2017

sksamuel commented Feb 23, 2017