Implement faster join traversal #14539

Dandandan · 2025-02-07T08:34:14Z

Which issue does this PR close?

We can speed up finding matching indices by separating the lookups and chain traversal.

Closes #.

Todo

Implement outputting in batches
Keep ordering of results(?)
Run more benchmarks (e.g. h2o join / imdb ...)

Rationale for this change

What changes are included in this PR?

This simplifies the algorithm for traversing the chain and makes it more vectorizable.

--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ join_vectorization ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  981.31ms │           952.00ms │     no change │
│ QQuery 2     │  157.88ms │           152.47ms │     no change │
│ QQuery 3     │  475.15ms │           472.37ms │     no change │
│ QQuery 4     │  240.46ms │           231.75ms │     no change │
│ QQuery 5     │  700.03ms │           684.56ms │     no change │
│ QQuery 6     │  157.46ms │           155.71ms │     no change │
│ QQuery 7     │ 1072.19ms │          1017.78ms │ +1.05x faster │
│ QQuery 8     │  740.30ms │           757.31ms │     no change │
│ QQuery 9     │ 1189.08ms │          1172.25ms │     no change │
│ QQuery 10    │  666.24ms │           681.22ms │     no change │
│ QQuery 11    │  101.91ms │           100.25ms │     no change │
│ QQuery 12    │  338.89ms │           325.82ms │     no change │
│ QQuery 13    │  475.00ms │           464.34ms │     no change │
│ QQuery 14    │  266.15ms │           264.06ms │     no change │
│ QQuery 15    │  442.98ms │           447.49ms │     no change │
│ QQuery 16    │  111.29ms │           114.29ms │     no change │
│ QQuery 17    │ 1249.82ms │          1257.36ms │     no change │
│ QQuery 18    │ 2052.52ms │          1772.82ms │ +1.16x faster │
│ QQuery 19    │  464.19ms │           456.00ms │     no change │
│ QQuery 20    │  470.89ms │           453.13ms │     no change │
│ QQuery 21    │ 1637.42ms │          1572.06ms │     no change │
│ QQuery 22    │  156.41ms │           145.56ms │ +1.07x faster │
└──────────────┴───────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                 │ 14147.60ms │
│ Total Time (join_vectorization)   │ 13650.61ms │
│ Average Time (main)               │   643.07ms │
│ Average Time (join_vectorization) │   620.48ms │
│ Queries Faster                    │          3 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │         19 │
└───────────────────────────────────┴────────────┘

--------------------
Benchmark tpch_mem_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ join_vectorization ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  770.46ms │           774.09ms │     no change │
│ QQuery 2     │  125.30ms │           123.01ms │     no change │
│ QQuery 3     │  259.14ms │           247.91ms │     no change │
│ QQuery 4     │  134.04ms │           124.88ms │ +1.07x faster │
│ QQuery 5     │  525.49ms │           516.03ms │     no change │
│ QQuery 6     │   45.93ms │            39.85ms │ +1.15x faster │
│ QQuery 7     │ 1079.09ms │           836.97ms │ +1.29x faster │
│ QQuery 8     │  336.17ms │           327.84ms │     no change │
│ QQuery 9     │  881.10ms │           843.84ms │     no change │
│ QQuery 10    │  406.44ms │           383.09ms │ +1.06x faster │
│ QQuery 11    │   94.73ms │            87.25ms │ +1.09x faster │
│ QQuery 12    │  280.55ms │           264.89ms │ +1.06x faster │
│ QQuery 13    │  292.29ms │           279.96ms │     no change │
│ QQuery 14    │   47.03ms │            48.84ms │     no change │
│ QQuery 15    │  137.05ms │           136.95ms │     no change │
│ QQuery 16    │   94.72ms │            84.73ms │ +1.12x faster │
│ QQuery 17    │  910.07ms │           909.83ms │     no change │
│ QQuery 18    │ 3759.48ms │          3194.88ms │ +1.18x faster │
│ QQuery 19    │  183.15ms │           178.16ms │     no change │
│ QQuery 20    │  242.99ms │           239.17ms │     no change │
│ QQuery 21    │ 1582.68ms │          1505.24ms │     no change │
│ QQuery 22    │   95.75ms │            97.84ms │     no change │
└──────────────┴───────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                 │ 12283.64ms │
│ Total Time (join_vectorization)   │ 11245.27ms │
│ Average Time (main)               │   558.35ms │
│ Average Time (join_vectorization) │   511.15ms │
│ Queries Faster                    │          8 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │         14 │
└───────────────────────────────────┴────────────┘

Are these changes tested?

Are there any user-facing changes?

Dandandan · 2025-02-09T14:25:02Z

Update after implementing emitting in batch size: in memory performs about the same as before, but tpch_10 regressed compared to earlier implementation (so the diff there seems more having to with output batch size in some way).

Dandandan · 2025-02-09T15:04:17Z

This is ready for review

Dandandan · 2025-02-09T20:10:21Z

seeing some regressions in imdb benchmark

github-actions bot added the physical-expr Physical Expressions label Feb 7, 2025

Implement faster join algorithm

a6c8ee5

Dandandan force-pushed the join_vectorization branch from 7d1caf4 to a6c8ee5 Compare February 7, 2025 08:35

Dandandan changed the title ~~[WIP] Implement faster join algorithm~~ [WIP] Implement faster join traversal Feb 7, 2025

Dandandan added 3 commits February 7, 2025 09:39

Fmt

4ed9cf3

Fix ordering

e832fde

Fix partial emission

a3e16ef

Dandandan changed the title ~~[WIP] Implement faster join traversal~~ Implement faster join traversal Feb 9, 2025

Dandandan added 3 commits February 9, 2025 15:55

Clippy

3c56471

Fix test

36870a8

Fix test

73afeff

Dandandan marked this pull request as ready for review February 9, 2025 14:59

Dandandan marked this pull request as draft February 9, 2025 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement faster join traversal #14539

Implement faster join traversal #14539

Dandandan commented Feb 7, 2025 •

edited

Loading

Dandandan commented Feb 9, 2025 •

edited

Loading

Dandandan commented Feb 9, 2025

Dandandan commented Feb 9, 2025

Implement faster join traversal #14539

Are you sure you want to change the base?

Implement faster join traversal #14539

Conversation

Dandandan commented Feb 7, 2025 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Dandandan commented Feb 9, 2025 • edited Loading

Dandandan commented Feb 9, 2025

Dandandan commented Feb 9, 2025

Dandandan commented Feb 7, 2025 •

edited

Loading

Dandandan commented Feb 9, 2025 •

edited

Loading