Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

cupy and horovod cannot be installed into the same powerai 1.6 environment due to compiler incompatibilities #19

Open
den-run-ai opened this issue May 7, 2019 · 6 comments

Comments

@den-run-ai
Copy link

I'm getting this error for pip install cupy with powerai 1.6:

Collecting cupy
  Using cached https://files.pythonhosted.org/packages/cd/d6/532e5da87f3b513cd0b98bcbf9a58fb6758598039944c42cb93d13b71a5f/cupy-5.4.0.tar.gz
    Complete output from command python setup.py egg_info:
    Options: {'package_name': 'cupy', 'long_description': None, 'wheel_libs': [], 'no_rpath': False, 'profile': False, 'linetrace': False, 'annotate': False, 'no_cuda': False}
    
    -------- Configuring Module: cuda --------
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    /data/gpfs/Users/j0541825/anaconda3/envs/powerai.1.6/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/7.3.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: cannot find -lcuda
    collect2: error: ld returned 1 exit status
    Cannot build a stub file.
    Original error: command '/data/gpfs/Users/j0541825/anaconda3/envs/powerai.1.6/bin/powerpc64le-conda_cos7-linux-gnu-c++' failed with exit status 1
    
    ************************************************************
    * CuPy Configuration Summary                               *
    ************************************************************
    
    Build Environment:
      Include directories: ['/data/gpfs/Users/j0541825/anaconda3/envs/powerai.1.6/include']
      Library directories: ['/data/gpfs/Users/j0541825/anaconda3/envs/powerai.1.6/lib64', '/data/gpfs/Users/j0541825/anaconda3/envs/powerai.1.6/lib']
      nvcc command       : ['/data/gpfs/Users/j0541825/anaconda3/envs/powerai.1.6/bin/nvcc']
    
    Environment Variables:
      CFLAGS          : -mcpu=power8 -mtune=power8 -mpower8-fusion -mpower8-vector -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe
      LDFLAGS         : -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now
      LIBRARY_PATH    : (none)
      CUDA_PATH       : (none)
      NVTOOLSEXT_PATH : (none)
      NVCC            : (none)
    
    Modules:
      cuda      : No
        -> Cannot link libraries: ['cublas', 'cuda', 'cudart', 'cufft', 'curand', 'cusparse', 'nvrtc']
        -> Check your LDFLAGS environment variable.
    
    ERROR: CUDA could not be found on your system.
    Please refer to the Installation Guide for details:
    https://docs-cupy.chainer.org/en/stable/install.html
    
    ************************************************************
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-f62gddrq/cupy/setup.py", line 120, in <module>
        ext_modules = cupy_setup_build.get_ext_modules()
      File "/tmp/pip-install-f62gddrq/cupy/cupy_setup_build.py", line 588, in get_ext_modules
        extensions = make_extensions(arg_options, compiler, use_cython)
      File "/tmp/pip-install-f62gddrq/cupy/cupy_setup_build.py", line 384, in make_extensions
        raise Exception('Your CUDA environment is invalid. '
    Exception: Your CUDA environment is invalid. Please check above error log.
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-f62gddrq/cupy/

@hartb
Copy link
Member

hartb commented May 7, 2019

Looks like cupy mainly found CUDA OK (note it found nvcc), and most of those CUDA libraries in the PowerAI packaging will under lib64 in your environment (i.e. ..../powerai.1.6/lib64/).

But the one problem library:

    /data/gpfs/Users/j0541825/anaconda3/envs/powerai.1.6/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/7.3.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: cannot find -lcuda
    collect2: error: ld returned 1 exit status

is a bit different. libcuda is kind of part of the GPU driver stack, rather than part of the CUDA Toolkit proper.

The Toolkit does include a stub copy of that library, suitable for building apps against. You should find that in $CONDA_PREFIX/lib/stubs/. You'll probably need to inform cupy about its location (possibly via LDFLAGS or LIBRARY_PATH).

For running the app, you'll need the "real" libcuda.so. That should installed in your environment in some default system search location as part of the GPU driver installation.

@den-run-ai
Copy link
Author

@hartb the issue is actually much simpler. Horovod installation instructions for powerai 1.6 from @nvcastet require a compiler toolchain installed as conda packages (see link below). This interferes with the way the libraries are setup/searched on the machine. pip install cupy works out of the box even without powerai installation on python 3.6 powerpc anaconda environment. Note that after toolchain for Horovod is installed, the compilers are not removed as part of conda uninstall. So the environment gets messed up. So don't mix cupy and horovod :(

horovod/horovod#847 (comment)

@den-run-ai den-run-ai changed the title instructions for building cupy in powerai 1.6? cupy and horovod cannot be installed into the same powerai 1.6 environment due to compiler incompatibilities Jun 5, 2019
@den-run-ai
Copy link
Author

@nvcastet with powerai 1.6.1 is horovod still not provided as a conda package? How can I resolve this issue above?

@hartb
Copy link
Member

hartb commented Jun 20, 2019

@denfromufa If I can speak for @nvcastet... I'm afraid horovod is still not included as a PowerAI (now Watson Machine Learning Community Edition (WML CE)) 1.6.1 package.

But I was able to build horovod and cupy together in a PowerAI 1.6.0 container with the steps below. Maybe they'll work for you. I'm not familiar enough with horovod to exercise the components together, but at least I can get build to work:

# Install compiler as from @nvcastet's blog, but be sure to install
# gcc / g++ v7, rather than the default (v8).

conda install gxx_linux-ppc64le=7 cffi cudatoolkit-dev

# Ensure that the Anaconda compilers are visible in the path as "gcc" and "g++".
# Needed for both horovod and cupy build.
#
# nvcc tries to execute the compilers by those names specifically. It doesn't
# honor typical environment variables (e.g. CC, GCC) that would point to
# the compiler. And it invokes the compilers in a way that's not informed
# by shell aliases.

mkdir $HOME/bin
ln -s $CONDA_PREFIX/bin/*-gcc $HOME/bin/gcc
ln -s $CONDA_PREFIX/bin/*-g++ $HOME/bin/g++
export PATH="$PATH:$HOME/bin"
which gcc g++

# Build horovod as described by @nvcastet

HOROVOD_CUDA_HOME=$CONDA_PREFIX HOROVOD_GPU_ALLREDUCE=DDL pip install horovod --no-cache-dir

# Set up variable to help cupy build find libcuda.so

LIBCUDA_DIR=$(find /usr -name "libcuda.so" -printf "%h" -quit)
echo $LIBCUDA_DIR

# Kick off the build

LDFLAGS="$LDFLAGS -L$LIBCUDA_DIR" pip install cupy

@smatzek
Copy link
Collaborator

smatzek commented Jun 20, 2019

On using OpenMPI on Power systems. I recently needed to run a model written with Horovod and mpi4py. I could not get horovodrun to work with the NCCL backend with Spectrum MPI so I eventually used openmpi. Trying to compile and run Horovod against the openmpi installed as an RPM with PowerAI's TensorFlow did not work for various reasons. What eventually worked, and worked well, was:

  1. build openmpi as a conda package
  2. Install openmpi
  3. Install compilers in the conda env.
  4. pip install/build Horovod
  5. pip install/build mpi4py

@den-run-ai
Copy link
Author

Ok, guys - let me test this out in the next few days :) @smatzek how did you build openmpi as a conda package, which recipe did you use?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants