-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce libcugraph wheels #4804
introduce libcugraph wheels #4804
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
ci/validate_wheel.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to run the size checks for libcugraph
wheels for each CUDA version? I see max_allowed_size_compressed = '1.3G'
in the pyproject.toml
but that seems like it would be for CUDA 11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a meaningful size difference across CUDA versions for libcugraph
wheels here.
For example, look at these builds from this PR:
CUDA 11.8.0, Python 3.12, amd64 (build link)
file size
* compressed size: 0.9G
* uncompressed size: 1.4G
CUDA 12.5.1, Python 3.12, amd64 (build link)
file size
* compressed size: 0.9G
* uncompressed size: 1.4G
I think that cuGraph doesn't actually directly need any of the CUDA math libraries (cuBLAS, cuSolver, etc.)... just indirectly via RAFT needing them. We're observing this now that libcugraph
wheels are linking against a pre-built RAFT delivered by the libraft
wheels, instead of building RAFT from source here.
I don't see any of those libraries referenced in code here (except some cuRAND usage in C++ tests)
git grep -i blas cpp/
git grep -i blas python/
git grep -i fft cpp/
git grep -i fft python/
git grep -i solver cpp/
git grep -i solver python/
git grep -i curand cpp/
git grep -i curand python/
Does that sound plausible? If so, I'll try dropping the dependency on CUDA math wheels for libcugraph
wheels here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your assessment (confirmed independently). Let's try dropping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit more research to validate my understanding.
On the conda side, RAFT has no run exports that would cause a host:
dependency on RAFT to add run:
dependencies on the related CUDA math libraries. We do this because we don't know which math libraries would be relevant for any given RAFT consumer. However, because libraft wheels already supply those math libraries, we won't need to specify them for cuGraph. Transitive discovery is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok perfect, thank you.
I've pushed commits here removing the direct dependencies on CUDA math libraries for cuGraph wheels. There's one caveat... they do need to be listed out explicitly in the --exclude
args passed to auditwheel repair
, or otherwise auditwheel
ends up vendoring them. I've done that in ci/build_wheel_libcugraph.sh
.
Replaces #2306, contributes to rapidsai/build-planning#33. Proposes packaging `libraft` as a wheel, which is then re-used by: * `pylibraft-cu{11,12}` and `raft-cu{11,12}` (this PR) * `libcugraph-cu{11,12}`, `pylibcugraph-cu{11,12}`, and `cugraph-cu{11,12}` in rapidsai/cugraph#4804 * `libcuml-cu{11,12}` and `cuml-cu{11,12}` in rapidsai/cuml#6199 As part of this, also proposes: * introducing a new CMake option, `RAFT_COMPILE_DYNAMIC_ONLY`, to allow building/installing only the dynamic shared library (i.e. skipping the static library) * enforcing `rapids-cmake`'s preferred CMake style (#2531 (comment)) * making wheel-building CI jobs always depend on other wheel-building CI jobs, not tests or `*-publish` (to reduce end-to-end CI time) ## Notes for Reviewers ### Benefits of these changes * smaller wheels (see "Size Changes" below) * faster compile times (no more re-compiling RAFT in cuGraph and cuML CI) * other benefits mentioned in rapidsai/build-planning#33 ### Wheel contents `libraft`: * `libraft.so` (shared library) * RAFT headers * vendored dependencies (`fmt`, CCCL, `cuco`, `cute`, `cutlass`) `pylibraft`: * `pylibraft` Python / Cython code and compiled Cython extensions `raft-dask`: * `raft-dask` Python / Cython code and compiled Cython extension ### Dependency Flows In short.... `libraft` contains a `libraft.so` dynamic library and the headers to link against it. * Anything that needs to link against RAFT at build time pulls in `libraft` wheels as a build dependency. * Anything that needs RAFT's symbols at runtime pulls it in as a runtime dependency, and calls `libraft.load_library()`. For more details and some flowcharts, see rapidsai/build-planning#33 (comment) ### Size changes (CUDA 12, Python 3.12, x86_64) | wheel | num files (before) | num files (these PRs) | size (before) | size (these PRs) | |:---------------:|------------------:|-----------------:|--------------:|-------------:| | `libraft`. | --- | 3169 | --- | 19M | | `pylibraft` | 64 | 63 | 11M | 1M | | `raft-dask` | 29 | 28 | 188M | 188M | | `libcugraph` | --- | 1762 | --- | 903M | | `pylibcugraph` | 190 | 187 | 901M | 2M | | `cugraph` | 315 | 313 | 899M | 3.0M | | `libcuml` | --- | 1766 | --- | 289M | | `cuml` | 442 | --- | 517M | --- | |**TOTAL** | **1,040** | **7,268** | **2,516M** | **1,405M** | *NOTES: size = compressed, "before" = 2025-01-13 nightlies* <details><summary>how I calculated those (click me)</summary> * `cugraph`: nightly commit = rapidsai/cugraph@8507cbf, PR = rapidsai/cugraph#4804 * `cuml`: nightly commit = rapidsai/cuml@7c715c4, PR = rapidsai/cuml#6199 * `raft`: nightly commit = 1b62c41, PR = this PR ```shell docker run \ --rm \ --network host \ --env RAPIDS_NIGHTLY_DATE=2025-01-13 \ --env CUGRAPH_NIGHTLY_SHA=8507cbf63db2f349136b266d3e6e787b189f45a0 \ --env CUGRAPH_PR="pull-request/4804" \ --env CUGRAPH_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \ --env CUML_NIGHTLY_SHA=7c715c494dff71274d0fdec774bdee12a7e78827 \ --env CUML_PR="pull-request/6199" \ --env CUML_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \ --env RAFT_NIGHTLY_SHA=1b62c4117a35b11ce3c830daae248e32ebf75e3f \ --env RAFT_PR="pull-request/2531" \ --env RAFT_PR_SHA="0d6597b08919f2aae8ac268f1a68d6a8fe5beb4e" \ --env RAPIDS_PY_CUDA_SUFFIX=cu12 \ --env WHEEL_DIR_BEFORE=/tmp/wheels-before \ --env WHEEL_DIR_AFTER=/tmp/wheels-after \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \ bash # --- nightly wheels --- # mkdir -p ./wheels-before export RAPIDS_BUILD_TYPE=branch export RAPIDS_REF_NAME="branch-25.02" # pylibraft RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # raft-dask RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # cugraph RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # pylibcugraph RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # cuml RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_SHA=${CUML_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python ./wheels-before # --- wheels from CI --- # mkdir -p ./wheels-after export RAPIDS_BUILD_TYPE="pull-request" # libraft RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # pylibraft RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # raft-dask RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/raft \ RAPIDS_REF_NAME="${RAFT_PR}" \ RAPIDS_SHA="${RAFT_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # libcugraph RAPIDS_PY_WHEEL_NAME="libcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # pylibcugraph RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # cugraph RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cugraph \ RAPIDS_REF_NAME="${CUGRAPH_PR}" \ RAPIDS_SHA="${CUGRAPH_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after # libcuml RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_REF_NAME="${CUML_PR}" \ RAPIDS_SHA="${CUML_PR_SHA}" \ rapids-download-wheels-from-s3 cpp ./wheels-after # cuml RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REPOSITORY=rapidsai/cuml \ RAPIDS_REF_NAME="${CUML_PR}" \ RAPIDS_SHA="${CUML_PR_SHA}" \ rapids-download-wheels-from-s3 python ./wheels-after pip install pydistcheck pydistcheck \ --inspect \ --select 'distro-too-large-compressed' \ ./wheels-before/*.whl \ | grep -E '^checking|files: | compressed' \ > ./before.txt # get more exact sizes du -sh ./wheels-before/* pydistcheck \ --inspect \ --select 'distro-too-large-compressed' \ ./wheels-after/*.whl \ | grep -E '^checking|files: | compressed' \ > ./after.txt # get more exact sizes du -sh ./wheels-after/* ``` </details> ### How I tested this These other PRs: * rapidsai/devcontainers#435 * rapidsai/cugraph-gnn#110 * rapidsai/cuml#6199 * rapidsai/cugraph#4804
I bumped this to use published libraft wheels. |
Python tests here are failing as a result of these RMM changes: rapidsai/rmm#1775
Think that might be coming through via |
@@ -109,4 +110,3 @@ find .devcontainer/ -type f -name devcontainer.json -print0 | while IFS= read -r | |||
done | |||
|
|||
sed_runner "s/:[0-9][0-9]\.[0-9][0-9]/:${NEXT_SHORT_TAG}/" ./notebooks/README.md | |||
sed_runner "s/branch-[0-9][0-9].[0-9][0-9]/branch-${NEXT_SHORT_TAG}/" ./docs/cugraph/source/nx_cugraph/nx_cugraph.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test the changes in this PR, I tested update-version.sh
like this:
ci/release/update-version.sh '25.04.00'
That revealed that this file doesn't exist any more... because all the docs were moved over to https://github.com/rapidsai/cugraph-docs in #4837
Thought it wasn't worth a new PR and CI run just for this, so just adding it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to address.
ci/build_wheel_libcugraph.sh
Outdated
PARALLEL_LEVEL=$(python -c \ | ||
"from math import ceil; from multiprocessing import cpu_count; print(ceil(cpu_count()/4))") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from pylibcugraph, so I want to question it a bit. Are we really limited to only 1/4 of the available cores? I am guessing it's due to memory requirements at compile time?
Can we try to be more aggressive in a follow-up PR? Or can we add comments to explain why this is so limited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am guessing it's due to memory requirements at compile time?
Exactly, this was added back in #4489 (comment) to deal with out-of-memory issues building wheels.
Since most of the heavy compilation would move to libcugraph
wheels in this PR, I've moved this limiting (and requesting a cpu32
node) over to libcugraph
here.
HOWEVER... maybe now that these builds aren't also compiling RAFT and since #4720 reduced the number of TUs being compiled, we could get away with a higher degree of parallelism.
I'll push a change here trying to remove this limit, let's see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a changed removing this limiting of PARALLEL_LEVEL
. I left this requesting a cpu32
node for now. Let's see how it goes.
- depends_on_libraft | ||
- depends_on_librmm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It surprises me that we have so many C++ wheel dependencies in pylibcugraph. Does pylibcugraph directly use RAFT / RMM C++ headers? I would have expected that pylibraft / rmm would have sufficed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does pylibcugraph directly use RAFT / RMM C++ headers?
pylibcugraph
doesn't, but RAFT / RMM appear in libcugraph
's public headers and are therefore PUBLIC
dependencies in CMake. So those headers need to get into the build environment to link against libcugraph.so
.
cugraph/cpp/include/cugraph/algorithms.hpp
Lines 26 to 30 in 0b50bf9
#include <raft/core/device_span.hpp> | |
#include <raft/core/handle.hpp> | |
#include <raft/random/rng_state.hpp> | |
#include <rmm/resource_ref.hpp> |
Lines 482 to 485 in 0b50bf9
target_link_libraries(cugraph | |
PUBLIC | |
rmm::rmm | |
raft::raft |
This is related to the discussions from rapidsai/build-planning#92 , and it's for similar reasons that e.g. pylibcudf
has a librmm
build dependency despite not directly including any RMM headers:
Co-authored-by: Bradley Dice <[email protected]>
Contributes to rapidsai/build-planning#33 Adjusts `rapids-build-utils` manifest for release 25.02 to account for the introduction of new `libcugraph` wheels (rapidsai/cugraph#4804). ## Notes for Reviewers This shouldn't be merged still pointing at my forks. Plan: 1. admin-merge rapidsai/cugraph#4804 once everything except devcontainers CI there is passing 2. point this PR at upstream `rapidsai/cugraph` 3. observe CI passing and merge this normally (or admin-merge to save time) --------- Co-authored-by: Bradley Dice <[email protected]> Co-authored-by: Paul Taylor <[email protected]>
/merge |
Replaces #4340, contributes to rapidsai/build-planning#33.
Proposes packaging
libcugraph
as a wheel, which is then re-used bycugraph-cu{11,12}
andpylibcugraph-cu{11,12}
wheels.Notes for Reviewers
Benefits of these changes
pylibcugraph
andcugraph
both holding copies of libcugraph.sopylibcugraph
andcugraph
wheel buildsWheel contents
libcugraph
:libcugraph.so
(shared library)fmt
,spdlog
, CCCL,cuco
)pylibcugraph
:pylibcugraph
Python / Cython code and compiled Cython extensionscugraph
:cugraph
Python / Cython code and compiled Cython extensionDependency Flows
In short....
libcugraph
containslibcugraph.so
andlibcugraph_c.so
dynamic libraries and the headers to link against it.libcugraph
wheels as a build dependency.libcugraph.load_library()
.For more details and some flowcharts, see rapidsai/build-planning#33 (comment)
Size changes (CUDA 12, Python 3.12, x86_64)
libcugraph
pylibcugraph
cugraph
NOTES: size = compressed, "before" = 2025-01-13 nightlies
This is a cuGraph-specific slice of the table from rapidsai/raft#2531. See that PR for details.
How I tested this
These other PRs: