Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better partitioning in the bulk loading algorithm #73

Open
mourner opened this issue Apr 25, 2017 · 4 comments
Open

Better partitioning in the bulk loading algorithm #73

mourner opened this issue Apr 25, 2017 · 4 comments

Comments

@mourner
Copy link
Owner

mourner commented Apr 25, 2017

Currently, the bulk loading algorithm partitions each node into approximately sqrt(N) x sqrt(N) child nodes. This becomes a problem if a node is not a perfect square — child nodes will get narrower the deeper you go. I noticed this problem when looking at the viz for a rectangular data space:

image

Notice the very narrow rectangles at the bottom. We could fix this by designing an algorithm that picks a K x M partitioning that takes the aspect ratio of a node into account, to make child nodes approach square shape no matter how narrow they are. This should make query performance on bulk-loaded trees better.

cc @danpat

@mourner
Copy link
Owner Author

mourner commented Apr 25, 2017

Making good progress on this. Before and after:

omt1
omt2

@eric-corumdigital
Copy link

Were your improvements here merged into master?

@mourner
Copy link
Owner Author

mourner commented Nov 27, 2019

No — the approach from above was flawed (making bulk-load performance worse) and I never figured out how to go around that. Maybe I'll try again some time.

@mourner
Copy link
Owner Author

mourner commented Nov 27, 2019

Pushed the work-in-progress code I had to a7047e9 — feel free to poke around this. As far as I recall now, there were two issues:

  • Despite the tree looking much better visually, I couldn't get a meaningful search query improvement in benchmarks. Maybe I measured wrong though.
  • I didn't like having to recalculate the bounding box for all items on each iteration, this didn't feel right, although I never found an alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants