Better partitioning in the bulk loading algorithm #73

mourner · 2017-04-25T13:58:17Z

Currently, the bulk loading algorithm partitions each node into approximately sqrt(N) x sqrt(N) child nodes. This becomes a problem if a node is not a perfect square — child nodes will get narrower the deeper you go. I noticed this problem when looking at the viz for a rectangular data space:

Notice the very narrow rectangles at the bottom. We could fix this by designing an algorithm that picks a K x M partitioning that takes the aspect ratio of a node into account, to make child nodes approach square shape no matter how narrow they are. This should make query performance on bulk-loaded trees better.

cc @danpat

The text was updated successfully, but these errors were encountered:

mourner · 2017-04-25T20:22:35Z

Making good progress on this. Before and after:

eric-corumdigital · 2019-11-26T19:32:48Z

Were your improvements here merged into master?

mourner · 2019-11-27T09:21:16Z

No — the approach from above was flawed (making bulk-load performance worse) and I never figured out how to go around that. Maybe I'll try again some time.

mourner · 2019-11-27T09:32:31Z

Pushed the work-in-progress code I had to a7047e9 — feel free to poke around this. As far as I recall now, there were two issues:

Despite the tree looking much better visually, I couldn't get a meaningful search query improvement in benchmarks. Maybe I measured wrong though.
I didn't like having to recalculate the bounding box for all items on each iteration, this didn't feel right, although I never found an alternative.

mourner added the enhancement label Apr 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better partitioning in the bulk loading algorithm #73

Better partitioning in the bulk loading algorithm #73

mourner commented Apr 25, 2017

mourner commented Apr 25, 2017

eric-corumdigital commented Nov 26, 2019

mourner commented Nov 27, 2019

mourner commented Nov 27, 2019

Better partitioning in the bulk loading algorithm #73

Better partitioning in the bulk loading algorithm #73

Comments

mourner commented Apr 25, 2017

mourner commented Apr 25, 2017

eric-corumdigital commented Nov 26, 2019

mourner commented Nov 27, 2019

mourner commented Nov 27, 2019