Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119

Conversation

pavelToman
Copy link
Collaborator

@pavelToman pavelToman commented Jan 6, 2025

(created using eb --new-pr)
dependency of WhisperX: vscentrum/vsc-software-stack#482
dependency of whisper-ctranslate2: vscentrum/vsc-software-stack#483

REPLACED BY: #22134

…features-0.9.0-GCCcore-12.3.0.eb and patches: CTranslate2-4.5.0_replace-cxxopts.patch, CTranslate2-4.5.0_fix-third-party.patch, CTranslate2-4.5.0_fix-tests.patch
Copy link

github-actions bot commented Jan 6, 2025

Updated software cpu_features-0.9.0-GCCcore-12.3.0.eb

Diff against cpu_features-0.6.0-GCCcore-10.2.0.eb

easybuild/easyconfigs/c/cpu_features/cpu_features-0.6.0-GCCcore-10.2.0.eb

diff --git a/easybuild/easyconfigs/c/cpu_features/cpu_features-0.6.0-GCCcore-10.2.0.eb b/easybuild/easyconfigs/c/cpu_features/cpu_features-0.9.0-GCCcore-12.3.0.eb
index 42da9feba7..b2314ca5b7 100644
--- a/easybuild/easyconfigs/c/cpu_features/cpu_features-0.6.0-GCCcore-10.2.0.eb
+++ b/easybuild/easyconfigs/c/cpu_features/cpu_features-0.9.0-GCCcore-12.3.0.eb
@@ -1,22 +1,23 @@
 # This file is an EasyBuild reciPY as per https://github.com/easybuilders/easybuild
-# Author: Denis Kristak
+# Author: Denis Kristak, update: Pavel Tománek
 easyblock = 'CMakeMake'
 
 name = 'cpu_features'
-version = '0.6.0'
+version = '0.9.0'
 
 homepage = 'https://github.com/google/cpu_features'
 description = """A cross-platform C library to retrieve CPU features (such as available instructions) at runtime."""
 
-toolchain = {'name': 'GCCcore', 'version': '10.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
+toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/google/cpu_features/archive/']
 sources = ['v%(version)s.tar.gz']
-checksums = ['95a1cf6f24948031df114798a97eea2a71143bd38a4d07d9a758dda3924c1932']
+checksums = ['bdb3484de8297c49b59955c3b22dba834401bc2df984ef5cfc17acbe69c5018e']
 
 builddependencies = [
-    ('CMake', '3.18.4'),
-    ('binutils', '2.35'),
+    ('CMake', '3.26.3'),
+    ('binutils', '2.40'),
 ]
 
 modextrapaths = {'CPATH': 'include/cpu_features'}

@pavelToman
Copy link
Collaborator Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@pavelToman: Request for testing this PR well received on login1

PR test command 'EB_PR=22119 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_22119 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 14927

Test results coming soon (I hope)...

- notification for comment with ID 2572993551 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@pavelToman
Copy link
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4016.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 545.23.08, Python 3.6.8
See https://gist.github.com/pavelToman/e10e8d82c333c877b4d7739d6ab7696c for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/4198302ccca6d5ab77347c3bcb47e00c for a full test report.

@pavelToman
Copy link
Collaborator Author

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22119 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22119 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5502

Test results coming soon (I hope)...

- notification for comment with ID 2573056618 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/af309b58da23ea88b738b620f453897a for a full test report.

@boegel
Copy link
Member

boegel commented Jan 6, 2025

/tmp/vsc47063/easybuild/build/CTranslate2/4.5.0/foss-2023a-CUDA-12.1.1/CTranslate2-4.5.0/src/ops/mean_gpu.cu(36): error: no operator "/=" matches these operands
            operand types are: __nv_bfloat16 /= float
            output[blockIdx.x] /= AccumT(axis_size);
                               ^
          detected during:
            instantiation of "void ctranslate2::ops::mean_kernel<T,AccumT>(const T *, ctranslate2::cuda::index_t, ctranslate2::cuda::index_t, ctranslate2::cuda::index_t, __nv_bool, T *) [with T=__nv_bfloat16, AccumT=float]" at line 54
            instantiation of "void ctranslate2::ops::Mean::compute<D,T>(const ctranslate2::StorageView &, ctranslate2::dim_t, ctranslate2::dim_t, ctranslate2::dim_t, __nv_bool, ctranslate2::StorageView &) const [with D=ctranslate2::Device::CUDA, T=ctranslate2::bfloat16_t]" at line 68

How does this work on one system, but fail on another... :-/

@boegel
Copy link
Member

boegel commented Jan 6, 2025

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 9.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 555.42.06, Python 3.9.18
See https://gist.github.com/boegel/0d28f6fd0260c8d30f95355ba130e6b5 for a full test report.

@branfosj
Copy link
Member

branfosj commented Jan 6, 2025

  1. No NVIDIA driver and no GPU available
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
  1. NIVIDA driver available but no GPU
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
  1. NVIDIA driver and GPU availale
-- Autodetected CUDA architecture(s):  8.0

1 and 2 fail, as per #22119 (comment)

3 builds.

@boegel
Copy link
Member

boegel commented Jan 6, 2025

It could be what @branfosj mentioned, but I also see this on both the V100 (joltik) and A100 (accelgor) system:

EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.6

That's incorrect, since A100 only supports CUDA compute capability 8.0, while V100 only supports 7.0 (see https://developer.nvidia.com/cuda-gpus).
8.6 does make sense for A2 GPUs (like on our donphan cluster)

I tested myself on our A100 system using 8.0 instead of 8.6, and that works fine.

@branfosj
Copy link
Member

branfosj commented Jan 6, 2025

You need to use something of the form -DCUDA_ARCH_LIST="8.0;8.6;8.6+PTX"

My limited testing suggests that it fails with 7.0 in that list - so all GPUs of V100 and older...

  if(NOT CUDA_ARCH_LIST)
    set(CUDA_ARCH_LIST "Auto")
  elseif(CUDA_ARCH_LIST STREQUAL "Common")
    set(CUDA_ARCH_LIST ${CUDA_COMMON_GPU_ARCHITECTURES})
    # Keep deprecated but not yet dropped Compute Capabilities.
    if(CUDA_VERSION_MAJOR EQUAL 11)
      list(INSERT CUDA_ARCH_LIST 0 "3.5" "5.0")
    endif()
    list(REMOVE_DUPLICATES CUDA_ARCH_LIST)
  endif()

@boegel boegel added this to the release after 4.9.4 milestone Jan 6, 2025
@pavelToman
Copy link
Collaborator Author

Test report by @pavelToman
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
node3305.joltik.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, 1 x NVIDIA Tesla V100-SXM2-32GB, 555.42.06, Python 3.9.18
See https://gist.github.com/pavelToman/a8766f9237b0d170a218f1fe66804828 for a full test report.

@branfosj
Copy link
Member

branfosj commented Jan 7, 2025

In CTranslate2 docs there is Use a NVIDIA GPU with Tensor Cores (Compute Capability >= 7.0) so it should work with V100, isn't it?

The error is __nv_bfloat16 and I think those are only supported for CUDA capability >= 8.0.

@pavelToman
Copy link
Collaborator Author

I found that the V100 gpu could be the problem: on V100 (sm_70) the CUDA headers do not provide all the bfloat16 arithmetic operators the code is trying to use (in this case operator/=).
I am gonna test it on V100 gpu with CT2_CUDA_ALLOW_BF16=0, if it helps.

@pavelToman
Copy link
Collaborator Author

pavelToman commented Jan 7, 2025

CT2_CUDA_ALLOW_BF16=0 does not help for V100, but downgrade version to 4.3.1 helps. Seems latest (4.4.0, 4.5.0) versions do not support CUDA compute capability = 7.0 anymore.

@pavelToman
Copy link
Collaborator Author

It could be what @branfosj mentioned, but I also see this on both the V100 (joltik) and A100 (accelgor) system:

EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.6

That's incorrect, since A100 only supports CUDA compute capability 8.0, while V100 only supports 7.0 (see https://developer.nvidia.com/cuda-gpus). 8.6 does make sense for A2 GPUs (like on our donphan cluster)

I tested myself on our A100 system using 8.0 instead of 8.6, and that works fine.

Is this something I can fix by add cuda_compute_capabilities = ['8.0', '8.6'] to the easyconfig?

@boegel
Copy link
Member

boegel commented Jan 7, 2025

CT2_CUDA_ALLOW_BF16=0 does not help for V100, but downgrade version to 4.3.1 helps. Seems latest (4.4.0, 4.5.0) versions do not support CUDA compute capability = 7.0 anymore.

OK, fine, then we can't install this on our V100 system, no problem.

That's incorrect, since A100 only supports CUDA compute capability 8.0, while V100 only supports 7.0 (see https://developer.nvidia.com/cuda-gpus). 8.6 does make sense for A2 GPUs (like on our donphan cluster)
I tested myself on our A100 system using 8.0 instead of 8.6, and that works fine.

Is this something I can fix by add cuda_compute_capabilities = ['8.0', '8.6'] to the easyconfig?

No, people should correctly configure EasyBuild to use the CUDA compute capability that matches their GPU, there should be no system-specific things in the easyconfigs.

description = "Fast inference engine for Transformer models."

toolchain = {'name': 'foss', 'version': '2023a'}
toolchainopts = {'optarch': False}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavelToman Any specific reason you have this here?

Are there problems when this is not used?

Copy link
Collaborator Author

@pavelToman pavelToman Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, without toolchainopts = {'optarch': False} the build failed.
The log without optarch is there: https://github.com/vscentrum/vsc-software-stack/blob/wip/482_WhisperX/log2.txt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pavel told me that this is required to dance around a compiler problem, like:

/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
                  __v8hf __attribute__ ((__vector_size__ (16)));

see also https://forums.developer.nvidia.com/t/including-cub-header-breakes-compilation-with-gcc-12-and-sse2-or-better/255018

Maybe we should consider using a more recent CUDA version here, can you check if that would be feasible @pavelToman ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the GCC-12.3.0 and CUDA-12.1.1 are incompatible, so I replace CUDA-12.1.1 with version 12.6.0 ant it works!

@boegel
Copy link
Member

boegel commented Jan 7, 2025

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4301.litleo.os - Linux RHEL 9.4, x86_64, AMD EPYC 9454P 48-Core Processor (zen4), 1 x NVIDIA NVIDIA H100 NVL, 555.42.06, Python 3.9.18
See https://gist.github.com/boegel/6206884e4a77bbea9e9f3adfb13ad717 for a full test report.

@pavelToman
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants