Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce libcugraph wheels #4804

Merged
merged 72 commits into from
Jan 18, 2025

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Dec 5, 2024

Replaces #4340, contributes to rapidsai/build-planning#33.

Proposes packaging libcugraph as a wheel, which is then re-used by cugraph-cu{11,12} and pylibcugraph-cu{11,12} wheels.

Notes for Reviewers

Benefits of these changes

Wheel contents

libcugraph:

  • libcugraph.so (shared library)
  • cuGraph headers
  • vendored dependencies (fmt, spdlog, CCCL, cuco)

pylibcugraph:

  • pylibcugraph Python / Cython code and compiled Cython extensions

cugraph:

  • cugraph Python / Cython code and compiled Cython extension

Dependency Flows

In short.... libcugraph contains libcugraph.so and libcugraph_c.so dynamic libraries and the headers to link against it.

  • Anything that needs to link against cuGraph at build time pulls in libcugraph wheels as a build dependency.
  • Anything that needs cuGraph's symbols at runtime pulls it in as a runtime dependency, and calls libcugraph.load_library().

For more details and some flowcharts, see rapidsai/build-planning#33 (comment)

Size changes (CUDA 12, Python 3.12, x86_64)

wheel num files (before) num files (this PR) size (before) size (this PR)
libcugraph --- 1762 --- 903M
pylibcugraph 190 187 901M 2M
cugraph 315 313 899M 3M
TOTAL 505 2,262 1,800M 908M

NOTES: size = compressed, "before" = 2025-01-13 nightlies

This is a cuGraph-specific slice of the table from rapidsai/raft#2531. See that PR for details.

How I tested this

These other PRs:

This comment was marked as resolved.

@jameslamb jameslamb marked this pull request as ready for review January 14, 2025 16:59
@jameslamb jameslamb requested review from a team as code owners January 14, 2025 16:59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to run the size checks for libcugraph wheels for each CUDA version? I see max_allowed_size_compressed = '1.3G' in the pyproject.toml but that seems like it would be for CUDA 11?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a meaningful size difference across CUDA versions for libcugraph wheels here.

For example, look at these builds from this PR:

CUDA 11.8.0, Python 3.12, amd64 (build link)

file size
  * compressed size: 0.9G
  * uncompressed size: 1.4G

CUDA 12.5.1, Python 3.12, amd64 (build link)

file size
  * compressed size: 0.9G
  * uncompressed size: 1.4G

I think that cuGraph doesn't actually directly need any of the CUDA math libraries (cuBLAS, cuSolver, etc.)... just indirectly via RAFT needing them. We're observing this now that libcugraph wheels are linking against a pre-built RAFT delivered by the libraft wheels, instead of building RAFT from source here.

I don't see any of those libraries referenced in code here (except some cuRAND usage in C++ tests)

git grep -i blas cpp/
git grep -i blas python/

git grep -i fft cpp/
git grep -i fft python/

git grep -i solver cpp/
git grep -i solver python/

git grep -i curand cpp/
git grep -i curand python/

Does that sound plausible? If so, I'll try dropping the dependency on CUDA math wheels for libcugraph wheels here.

Copy link
Contributor

@bdice bdice Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment (confirmed independently). Let's try dropping it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit more research to validate my understanding.

On the conda side, RAFT has no run exports that would cause a host: dependency on RAFT to add run: dependencies on the related CUDA math libraries. We do this because we don't know which math libraries would be relevant for any given RAFT consumer. However, because libraft wheels already supply those math libraries, we won't need to specify them for cuGraph. Transitive discovery is sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok perfect, thank you.

I've pushed commits here removing the direct dependencies on CUDA math libraries for cuGraph wheels. There's one caveat... they do need to be listed out explicitly in the --exclude args passed to auditwheel repair, or otherwise auditwheel ends up vendoring them. I've done that in ci/build_wheel_libcugraph.sh.

raydouglass pushed a commit to rapidsai/raft that referenced this pull request Jan 16, 2025
Replaces #2306, contributes to
rapidsai/build-planning#33.

Proposes packaging `libraft` as a wheel, which is then re-used by:

* `pylibraft-cu{11,12}` and `raft-cu{11,12}` (this PR)
* `libcugraph-cu{11,12}`, `pylibcugraph-cu{11,12}`, and
`cugraph-cu{11,12}` in rapidsai/cugraph#4804
* `libcuml-cu{11,12}` and `cuml-cu{11,12}` in
rapidsai/cuml#6199

As part of this, also proposes:

* introducing a new CMake option, `RAFT_COMPILE_DYNAMIC_ONLY`, to allow
building/installing only the dynamic shared library (i.e. skipping the
static library)
* enforcing `rapids-cmake`'s preferred CMake style
(#2531 (comment))
* making wheel-building CI jobs always depend on other wheel-building CI
jobs, not tests or `*-publish` (to reduce end-to-end CI time)

## Notes for Reviewers

### Benefits of these changes

* smaller wheels (see "Size Changes" below)
* faster compile times (no more re-compiling RAFT in cuGraph and cuML
CI)
* other benefits mentioned in
rapidsai/build-planning#33

### Wheel contents

`libraft`:

* `libraft.so` (shared library)
* RAFT headers
* vendored dependencies (`fmt`, CCCL, `cuco`, `cute`, `cutlass`)

`pylibraft`:

* `pylibraft` Python / Cython code and compiled Cython extensions

`raft-dask`:

* `raft-dask` Python / Cython code and compiled Cython extension

### Dependency Flows

In short.... `libraft` contains a `libraft.so` dynamic library and the
headers to link against it.

* Anything that needs to link against RAFT at build time pulls in
`libraft` wheels as a build dependency.
* Anything that needs RAFT's symbols at runtime pulls it in as a runtime
dependency, and calls `libraft.load_library()`.

For more details and some flowcharts, see
rapidsai/build-planning#33 (comment)

### Size changes (CUDA 12, Python 3.12, x86_64)

| wheel | num files (before) | num files (these PRs) | size (before) |
size (these PRs) |

|:---------------:|------------------:|-----------------:|--------------:|-------------:|
| `libraft`. | --- | 3169 | --- | 19M |
| `pylibraft` | 64 | 63 | 11M | 1M |
| `raft-dask` | 29 | 28 | 188M | 188M |
| `libcugraph` | --- | 1762 | --- | 903M |
| `pylibcugraph` | 190 | 187 | 901M | 2M |
| `cugraph` | 315 | 313 | 899M | 3.0M |
| `libcuml` | --- | 1766 | --- | 289M |
| `cuml` | 442 | --- | 517M | --- |
|**TOTAL** | **1,040** | **7,268** | **2,516M** | **1,405M** |

*NOTES: size = compressed, "before" = 2025-01-13 nightlies*

<details><summary>how I calculated those (click me)</summary>

* `cugraph`: nightly commit =
rapidsai/cugraph@8507cbf,
PR = rapidsai/cugraph#4804
* `cuml`: nightly commit =
rapidsai/cuml@7c715c4,
PR = rapidsai/cuml#6199
* `raft`: nightly commit =
1b62c41,
PR = this PR

```shell
docker run \
    --rm \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2025-01-13 \
    --env CUGRAPH_NIGHTLY_SHA=8507cbf63db2f349136b266d3e6e787b189f45a0 \
    --env CUGRAPH_PR="pull-request/4804" \
    --env CUGRAPH_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \
    --env CUML_NIGHTLY_SHA=7c715c494dff71274d0fdec774bdee12a7e78827 \
    --env CUML_PR="pull-request/6199" \
    --env CUML_PR_SHA="2ef32eaa006a84c0bd16220bb8e8af34198fbee8" \
    --env RAFT_NIGHTLY_SHA=1b62c4117a35b11ce3c830daae248e32ebf75e3f \
    --env RAFT_PR="pull-request/2531" \
    --env RAFT_PR_SHA="0d6597b08919f2aae8ac268f1a68d6a8fe5beb4e" \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \
    bash

# --- nightly wheels --- #
mkdir -p ./wheels-before

export RAPIDS_BUILD_TYPE=branch
export RAPIDS_REF_NAME="branch-25.02"

# pylibraft
RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# raft-dask
RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_SHA=${RAFT_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# cugraph
RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# pylibcugraph
RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_SHA=${CUGRAPH_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_SHA=${CUML_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# --- wheels from CI --- #
mkdir -p ./wheels-after

export RAPIDS_BUILD_TYPE="pull-request"

# libraft
RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# pylibraft
RAPIDS_PY_WHEEL_NAME="pylibraft_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# raft-dask
RAPIDS_PY_WHEEL_NAME="raft_dask_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/raft \
RAPIDS_REF_NAME="${RAFT_PR}" \
RAPIDS_SHA="${RAFT_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# libcugraph
RAPIDS_PY_WHEEL_NAME="libcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# pylibcugraph
RAPIDS_PY_WHEEL_NAME="pylibcugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# cugraph
RAPIDS_PY_WHEEL_NAME="cugraph_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cugraph \
RAPIDS_REF_NAME="${CUGRAPH_PR}" \
RAPIDS_SHA="${CUGRAPH_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

# libcuml
RAPIDS_PY_WHEEL_NAME="libcuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# cuml
RAPIDS_PY_WHEEL_NAME="cuml_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuml \
RAPIDS_REF_NAME="${CUML_PR}" \
RAPIDS_SHA="${CUML_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

pip install pydistcheck
pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-before/*.whl \
| grep -E '^checking|files: | compressed' \
> ./before.txt

# get more exact sizes
du -sh ./wheels-before/*

pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-after/*.whl \
| grep -E '^checking|files: | compressed' \
> ./after.txt

# get more exact sizes
du -sh ./wheels-after/*
```

</details>

### How I tested this

These other PRs:

* rapidsai/devcontainers#435
* rapidsai/cugraph-gnn#110
* rapidsai/cuml#6199
* rapidsai/cugraph#4804
@bdice
Copy link
Contributor

bdice commented Jan 17, 2025

I bumped this to use published libraft wheels.

@jameslamb
Copy link
Member Author

Python tests here are failing as a result of these RMM changes: rapidsai/rmm#1775

E FutureWarning: The rmm._cuda.stream module is deprecated in 25.02 and will be removed in a future release. Use rmm.pylibrmm.stream instead.

(build link)

Think that might be coming through via cudf. It's being discussed at rapidsai/cuml#6229 (comment)

@@ -109,4 +110,3 @@ find .devcontainer/ -type f -name devcontainer.json -print0 | while IFS= read -r
done

sed_runner "s/:[0-9][0-9]\.[0-9][0-9]/:${NEXT_SHORT_TAG}/" ./notebooks/README.md
sed_runner "s/branch-[0-9][0-9].[0-9][0-9]/branch-${NEXT_SHORT_TAG}/" ./docs/cugraph/source/nx_cugraph/nx_cugraph.md
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test the changes in this PR, I tested update-version.sh like this:

ci/release/update-version.sh '25.04.00'

That revealed that this file doesn't exist any more... because all the docs were moved over to https://github.com/rapidsai/cugraph-docs in #4837

Thought it wasn't worth a new PR and CI run just for this, so just adding it here.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to address.

.github/workflows/build.yaml Outdated Show resolved Hide resolved
.github/workflows/pr.yaml Outdated Show resolved Hide resolved
.github/workflows/pr.yaml Outdated Show resolved Hide resolved
ci/build_wheel.sh Show resolved Hide resolved
Comment on lines 33 to 34
PARALLEL_LEVEL=$(python -c \
"from math import ceil; from multiprocessing import cpu_count; print(ceil(cpu_count()/4))")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from pylibcugraph, so I want to question it a bit. Are we really limited to only 1/4 of the available cores? I am guessing it's due to memory requirements at compile time?

Can we try to be more aggressive in a follow-up PR? Or can we add comments to explain why this is so limited?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am guessing it's due to memory requirements at compile time?

Exactly, this was added back in #4489 (comment) to deal with out-of-memory issues building wheels.

Since most of the heavy compilation would move to libcugraph wheels in this PR, I've moved this limiting (and requesting a cpu32 node) over to libcugraph here.

HOWEVER... maybe now that these builds aren't also compiling RAFT and since #4720 reduced the number of TUs being compiled, we could get away with a higher degree of parallelism.

I'll push a change here trying to remove this limit, let's see what happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a changed removing this limiting of PARALLEL_LEVEL. I left this requesting a cpu32 node for now. Let's see how it goes.

Comment on lines +165 to +166
- depends_on_libraft
- depends_on_librmm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It surprises me that we have so many C++ wheel dependencies in pylibcugraph. Does pylibcugraph directly use RAFT / RMM C++ headers? I would have expected that pylibraft / rmm would have sufficed here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pylibcugraph directly use RAFT / RMM C++ headers?

pylibcugraph doesn't, but RAFT / RMM appear in libcugraph's public headers and are therefore PUBLIC dependencies in CMake. So those headers need to get into the build environment to link against libcugraph.so.

#include <raft/core/device_span.hpp>
#include <raft/core/handle.hpp>
#include <raft/random/rng_state.hpp>
#include <rmm/resource_ref.hpp>

cugraph/cpp/CMakeLists.txt

Lines 482 to 485 in 0b50bf9

target_link_libraries(cugraph
PUBLIC
rmm::rmm
raft::raft

This is related to the discussions from rapidsai/build-planning#92 , and it's for similar reasons that e.g. pylibcudf has a librmm build dependency despite not directly including any RMM headers:

https://github.com/rapidsai/cudf/blob/a4bbd0930a0e4922f69586560b064a0bd9e6aedc/python/pylibcudf/pyproject.toml#L115

@jameslamb jameslamb added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed DO NOT MERGE Hold off on merging; see PR for details labels Jan 17, 2025
bdice added a commit to rapidsai/devcontainers that referenced this pull request Jan 18, 2025
Contributes to rapidsai/build-planning#33

Adjusts `rapids-build-utils` manifest for release 25.02 to account for
the introduction of new `libcugraph` wheels
(rapidsai/cugraph#4804).

## Notes for Reviewers

This shouldn't be merged still pointing at my forks. Plan:

1. admin-merge rapidsai/cugraph#4804 once
everything except devcontainers CI there is passing
2. point this PR at upstream `rapidsai/cugraph`
3. observe CI passing and merge this normally (or admin-merge to save
time)

---------

Co-authored-by: Bradley Dice <[email protected]>
Co-authored-by: Paul Taylor <[email protected]>
@bdice
Copy link
Contributor

bdice commented Jan 18, 2025

/merge

@rapids-bot rapids-bot bot merged commit 8ebff3b into rapidsai:branch-25.02 Jan 18, 2025
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake cuGraph improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants