Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initiate Join #29

Merged
merged 28 commits into from
Sep 8, 2024
Merged

feat: Initiate Join #29

merged 28 commits into from
Sep 8, 2024

Conversation

farbodahm
Copy link
Owner

@farbodahm farbodahm commented Jun 11, 2024

This PR initiates all required fundamentals for implementing streaming joins. It includes:

  • Protobuf serialization
  • State Store
    • Pebble for production use case
    • In memory for tests
  • Modification to main SDF to support dependency to SDFs it depeneds on
  • Functions for StreamTable-Inner join
  • Tests
  • Benchmark tests

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.01s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.30s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.01s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.30s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.02s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.30s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.12s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.01s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.30s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.11s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.30s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

@farbodahm
Copy link
Owner Author

Adding a reminder for the time that I'm back from Pytest Hackathon ;)

Till this part of the PR, we can serialize and de-serialize data of a record to and from Protocol Buffers.
Next step is to add Pebble with Repository pattern as state store.

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.02s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

@farbodahm farbodahm force-pushed the feature/init-join branch from 14802ae to 587af29 Compare June 25, 2024 21:02
@farbodahm farbodahm force-pushed the feature/init-join branch from 587af29 to 9898c82 Compare June 25, 2024 21:04
Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.01s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.30s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 6.12s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

  • BenchmarkAddStaticColumnFunction Duration: 5.92s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.81s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 2s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.21s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Jul 6, 2024

  • BenchmarkAddStaticColumnFunction Duration: 5.72s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.81s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.91s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.71s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Jul 9, 2024

  • BenchmarkAddStaticColumnFunction Duration: 5.91s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.71s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.91s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.51s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.50s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

@farbodahm
Copy link
Owner Author

This PR took longer than I expected; Reason was I got distracted with reading papers like Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams and testing different join algorithms locally.

Now I will continue with the basic model to implement a simple join.

@farbodahm farbodahm linked an issue Aug 17, 2024 that may be closed by this pull request
Copy link

  • BenchmarkAddStaticColumnFunction Duration: 5.82s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.20s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 2.01s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 1, 2024

  • BenchmarkAddStaticColumnFunction Duration: 6.12s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.51s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 2.01s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 1, 2024

  • BenchmarkAddStaticColumnFunction Duration: 6.11s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.01s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.81s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 4.11s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 1, 2024

  • BenchmarkAddStaticColumnFunction Duration: 5.92s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.51s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.91s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.11s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 1, 2024

  • BenchmarkAddStaticColumnFunction Duration: 6.12s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.60s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 2.01s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.11s ⏳
  • BenchmarkProtobufSerialization Duration: 2.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 3, 2024

  • BenchmarkAddStaticColumnFunction Duration: 5.81s ⏳
  • BenchmarkFilterFunction Duration: 1.51s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.31s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.91s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.41s ⏳
  • BenchmarkProtobufSerialization Duration: 2.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 3, 2024

  • BenchmarkAddStaticColumnFunction Duration: 5.92s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.71s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 2.01s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.41s ⏳
  • BenchmarkProtobufSerialization Duration: 2.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 4, 2024

  • BenchmarkAddStaticColumnFunction Duration: 7.35s ⏳
  • BenchmarkFilterFunction Duration: 2.40s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.01s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.81s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 2.41s ⏳
  • BenchmarkProtobufSerialization Duration: 2.61s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.71s ⏳
  • BenchmarkSchemaValidation Duration: 2.71s ⏳
  • BenchmarkSelect Duration: 2.71s ⏳

@farbodahm farbodahm changed the title feat: add record to Protobuf struct converter feat: Initiate Join Sep 4, 2024
Copy link

github-actions bot commented Sep 7, 2024

  • BenchmarkAddStaticColumnFunction Duration: 6.01s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 3.51s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.91s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 1.81s ⏳
  • BenchmarkProtobufSerialization Duration: 2.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 7, 2024

  • BenchmarkAddStaticColumnFunction Duration: 6.12s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.71s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 2.01s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.21s ⏳
  • BenchmarkProtobufSerialization Duration: 2.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

Copy link

github-actions bot commented Sep 7, 2024

  • BenchmarkAddStaticColumnFunction Duration: 5.91s ⏳
  • BenchmarkFilterFunction Duration: 1.50s ⏳
  • BenchmarkStreamTableInnerJoin Duration: 3.10s ⏳
  • BenchmarkPebbleStateStore_Set Duration: 2.50s ⏳
  • BenchmarkPebbleStateStore_GetWithCache Duration: 1.91s ⏳
  • BenchmarkPebbleStateStore_GetWithoutCache Duration: 3.21s ⏳
  • BenchmarkProtobufSerialization Duration: 2.41s ⏳
  • BenchmarkRenameColumnFunction Duration: 2.40s ⏳
  • BenchmarkSchemaValidation Duration: 2.40s ⏳
  • BenchmarkSelect Duration: 2.40s ⏳

@farbodahm farbodahm marked this pull request as ready for review September 7, 2024 22:36
@farbodahm farbodahm added the enhancement New feature or request label Sep 8, 2024
@farbodahm farbodahm merged commit 0ad3fbe into main Sep 8, 2024
3 checks passed
@farbodahm farbodahm deleted the feature/init-join branch September 8, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Functionality: Inner Join
1 participant