Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up edge bundling #1383

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

lmcinnes
Copy link

@lmcinnes lmcinnes commented Jan 7, 2025

The hammer edge bundling is fantastic, but can be quite time consuming for large graphs (100,000+ edges and upward). I spent some time benchmarking the code, and then doing some profiling to determine how the time was being spent, and then attempted to make some minor improvements. I quickly discovered that, at least on the machines and setups I tried, the dask usage actually made it significantly slower. I have therefore made dask optional, with a param use_dask which defaults to False. I also spent a while trying to wring the most I could out of numba for many of the core or frequently called functions. Primarily this involved adding more annotations to the decorators, and a careful rewrite of the distance function which is called extremely often. The remainder of the work was re-juggling when and where various computations were done to avoid duplicate work, or move more loops inside numba where possible. Lastly I rewrote _convert_edge_segments_to_dataframe to use a single large numpy allocation followed by zip and chain rather than a generator with many many small allocations (the code is a little less readable, but significantly faster for very large graphs).

After all these changes a typical use case for me (a knn-graph) went from 1h20m to 15s, and scaled up versions of the random graph examples from the docs (with n=500 and m=100000) went from 1h 13min for circular layout and 1h 39min for force-directed layout to 1min 38s and 60s respectively. This would make edge-bundling and graph drawing with datashader (which I love!) far more practical for a much wider range of datasets.

I'll be happy to discuss the changes as required -- some are more important than others, but I went down the optimization rabbit hole and did all the things I could.

@philippjfr
Copy link
Member

Woah, nice work! Will have to play around a bit.

@amaloney
Copy link
Contributor

amaloney commented Jan 7, 2025

Thanks @lmcinnes, I created an issue for this PR, issue #1384. The goal of the issue is to document the examples for the speedup, and to discuss other topics that are unrelated to the PR code.

@jbednar
Copy link
Member

jbednar commented Jan 7, 2025

That all sounds really promising. Thanks for the contribution! We'll review and let you know, but as @amaloney suggests, posting some benchmarking examples in the associated issue would be really useful.

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 95.09804% with 5 lines in your changes missing coverage. Please review.

Project coverage is 88.40%. Comparing base (6802220) to head (89984f4).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
datashader/bundling.py 95.09% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1383      +/-   ##
==========================================
- Coverage   88.42%   88.40%   -0.02%     
==========================================
  Files          93       93              
  Lines       18707    18727      +20     
==========================================
+ Hits        16541    16556      +15     
- Misses       2166     2171       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants