Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Distributed writes with EEL #253

Open
hannesmiller opened this issue Feb 22, 2017 · 3 comments
Open

Support Distributed writes with EEL #253

hannesmiller opened this issue Feb 22, 2017 · 3 comments
Assignees
Milestone

Comments

@hannesmiller
Copy link
Contributor

Support Distributed writes with EEL

  • N writers via JdbcSource -> KafkaSink
  • N Writers via HiveSink/KuduSink/HBaseSink
  • Now what if the HiveSink and others that use a LinkedBlockingQueue to service multiple writer threads could do this in a distributed fashion by wrapping the LinkedBlockingQueue interface, i.e. an implementation that wraps a Kafka topic - default one would still remain as threads?
  • The gotcha is that when you are out-of-process you lose control on how to partition the data into reasonable sizes
  • However for row oriented storage systems like Kudu and HBase it's perfect - the same usage pattern would even work for the JdbcSink

What do you think?

@sksamuel
Copy link
Contributor

I don't understand sorry. At the moment HiveSink has multiple writers, through multiple threads. Are you meaning something else?

@hannesmiller
Copy link
Contributor Author

Probably not such a great idea as there maybe better approaches.

I understand the multiple thread writers but what if you could flip a switch to share the writes over stateless worker processes, i.e. spin up Kafka/YARN workers

  • Each YARN worker accepts rows over say Kafka and uses EEL to write

@sksamuel
Copy link
Contributor

You can, but its then no longer a standalone in process API and turning more into spark.

@garyfrost garyfrost added this to the 1.4 milestone Feb 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants