Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Sharding #286

Open
Max-Meldrum opened this issue Nov 8, 2021 · 2 comments
Open

Data Sharding #286

Max-Meldrum opened this issue Nov 8, 2021 · 2 comments
Labels
domain: api Anything related to the Arcon API domain: state Anything related to Arcon State epic feature
Milestone

Comments

@Max-Meldrum
Copy link
Member

The following are some aspects we need to address/implement to enable data parallelism:

  • KeyHasher (e.g., Murmur3)
  • Range vs. Hash Sharding
  • KeyExtractor
  • Keyed Streams (KeyedStream, KeyedArrowStream)
  • Region & Shards
@Max-Meldrum Max-Meldrum added domain: api Anything related to the Arcon API domain: state Anything related to Arcon State labels Nov 8, 2021
@segeljakt
Copy link
Member

segeljakt commented Nov 8, 2021

One question, should it be possible for operators to be generic over KeyedStream<K, T> and Stream<T>? For example, the map operator in:

let stream0: Stream<_> = ...;
let stream1: Stream<_> = stream.map(...);

let stream2: KeyedStream<_, _> = ...;
let stream3: KeyedStream<_, _> = stream.map(...);

If not, we need special operator implementations to handle each kind of stream. In Flink they use subtyping. Although Rust does not have it, we could maybe use traits to achieve something similar:

trait Stream<T> { ... }
trait KeyedStream<K, T>: Stream<T> { ... }

@Max-Meldrum
Copy link
Member Author

One question, should it be possible for operators to be generic over KeyedStream<K, T> and Stream<T>? For example, the map operator in:

let stream0: Stream<_> = ...;
let stream1: Stream<_> = stream.map(...);

let stream2: KeyedStream<_, _> = ...;
let stream3: KeyedStream<_, _> = stream.map(...);

If not, we need special operator implementations to handle each kind of stream. In Flink they use subtyping. Although Rust does not have it, we could maybe use traits to achieve something similar:

trait Stream<T> { ... }
trait KeyedStream<K, T>: Stream<T> { ... }

Would be nice to have the same map on both. Yeah, may need to use traits in that way. Something to discuss tomorrow 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: api Anything related to the Arcon API domain: state Anything related to Arcon State epic feature
Projects
None yet
Development

No branches or pull requests

2 participants