-
Notifications
You must be signed in to change notification settings - Fork 54
cupy and horovod cannot be installed into the same powerai 1.6 environment due to compiler incompatibilities #19
Comments
Looks like cupy mainly found CUDA OK (note it found But the one problem library:
is a bit different. libcuda is kind of part of the GPU driver stack, rather than part of the CUDA Toolkit proper. The Toolkit does include a stub copy of that library, suitable for building apps against. You should find that in For running the app, you'll need the "real" |
@hartb the issue is actually much simpler. Horovod installation instructions for powerai 1.6 from @nvcastet require a compiler toolchain installed as conda packages (see link below). This interferes with the way the libraries are setup/searched on the machine. |
@nvcastet with powerai 1.6.1 is horovod still not provided as a conda package? How can I resolve this issue above? |
@denfromufa If I can speak for @nvcastet... I'm afraid horovod is still not included as a PowerAI (now Watson Machine Learning Community Edition (WML CE)) 1.6.1 package. But I was able to build horovod and cupy together in a PowerAI 1.6.0 container with the steps below. Maybe they'll work for you. I'm not familiar enough with horovod to exercise the components together, but at least I can get build to work:
|
On using OpenMPI on Power systems. I recently needed to run a model written with Horovod and mpi4py. I could not get horovodrun to work with the NCCL backend with Spectrum MPI so I eventually used openmpi. Trying to compile and run Horovod against the openmpi installed as an RPM with PowerAI's TensorFlow did not work for various reasons. What eventually worked, and worked well, was:
|
Ok, guys - let me test this out in the next few days :) @smatzek how did you build openmpi as a conda package, which recipe did you use? |
I'm getting this error for
pip install cupy
with powerai 1.6:The text was updated successfully, but these errors were encountered: