{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119

pavelToman · 2025-01-06T12:13:21Z

(created using eb --new-pr)
dependency of WhisperX: vscentrum/vsc-software-stack#482
dependency of whisper-ctranslate2: vscentrum/vsc-software-stack#483

REPLACED BY: #22134

…features-0.9.0-GCCcore-12.3.0.eb and patches: CTranslate2-4.5.0_replace-cxxopts.patch, CTranslate2-4.5.0_fix-third-party.patch, CTranslate2-4.5.0_fix-tests.patch

github-actions · 2025-01-06T12:13:50Z

Updated software `cpu_features-0.9.0-GCCcore-12.3.0.eb`

Diff against cpu_features-0.6.0-GCCcore-10.2.0.eb

easybuild/easyconfigs/c/cpu_features/cpu_features-0.6.0-GCCcore-10.2.0.eb

diff --git a/easybuild/easyconfigs/c/cpu_features/cpu_features-0.6.0-GCCcore-10.2.0.eb b/easybuild/easyconfigs/c/cpu_features/cpu_features-0.9.0-GCCcore-12.3.0.eb
index 42da9feba7..b2314ca5b7 100644
--- a/easybuild/easyconfigs/c/cpu_features/cpu_features-0.6.0-GCCcore-10.2.0.eb
+++ b/easybuild/easyconfigs/c/cpu_features/cpu_features-0.9.0-GCCcore-12.3.0.eb
@@ -1,22 +1,23 @@
 # This file is an EasyBuild reciPY as per https://github.com/easybuilders/easybuild
-# Author: Denis Kristak
+# Author: Denis Kristak, update: Pavel Tománek
 easyblock = 'CMakeMake'
 
 name = 'cpu_features'
-version = '0.6.0'
+version = '0.9.0'
 
 homepage = 'https://github.com/google/cpu_features'
 description = """A cross-platform C library to retrieve CPU features (such as available instructions) at runtime."""
 
-toolchain = {'name': 'GCCcore', 'version': '10.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
+toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/google/cpu_features/archive/']
 sources = ['v%(version)s.tar.gz']
-checksums = ['95a1cf6f24948031df114798a97eea2a71143bd38a4d07d9a758dda3924c1932']
+checksums = ['bdb3484de8297c49b59955c3b22dba834401bc2df984ef5cfc17acbe69c5018e']
 
 builddependencies = [
-    ('CMake', '3.18.4'),
-    ('binutils', '2.35'),
+    ('CMake', '3.26.3'),
+    ('binutils', '2.40'),
 ]
 
 modextrapaths = {'CPATH': 'include/cpu_features'}

pavelToman · 2025-01-06T12:16:03Z

@boegelbot please test @ generoso

boegelbot · 2025-01-06T12:20:05Z

@pavelToman: Request for testing this PR well received on login1

PR test command 'EB_PR=22119 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_22119 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

exit code: 0
output:

Submitted batch job 14927

Test results coming soon (I hope)...

- notification for comment with ID 2572993551 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

pavelToman · 2025-01-06T12:24:31Z

Test report by @pavelToman
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4016.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 545.23.08, Python 3.6.8
See https://gist.github.com/pavelToman/e10e8d82c333c877b4d7739d6ab7696c for a full test report.

boegelbot · 2025-01-06T12:30:53Z

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/4198302ccca6d5ab77347c3bcb47e00c for a full test report.

pavelToman · 2025-01-06T12:55:38Z

@boegelbot please test @ jsc-zen3-a100

boegelbot · 2025-01-06T13:00:13Z

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22119 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22119 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

exit code: 0
output:

Submitted batch job 5502

Test results coming soon (I hope)...

- notification for comment with ID 2573056618 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

boegelbot · 2025-01-06T13:04:56Z

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.21
See https://gist.github.com/boegelbot/af309b58da23ea88b738b620f453897a for a full test report.

boegel · 2025-01-06T15:45:25Z

/tmp/vsc47063/easybuild/build/CTranslate2/4.5.0/foss-2023a-CUDA-12.1.1/CTranslate2-4.5.0/src/ops/mean_gpu.cu(36): error: no operator "/=" matches these operands
            operand types are: __nv_bfloat16 /= float
            output[blockIdx.x] /= AccumT(axis_size);
                               ^
          detected during:
            instantiation of "void ctranslate2::ops::mean_kernel<T,AccumT>(const T *, ctranslate2::cuda::index_t, ctranslate2::cuda::index_t, ctranslate2::cuda::index_t, __nv_bool, T *) [with T=__nv_bfloat16, AccumT=float]" at line 54
            instantiation of "void ctranslate2::ops::Mean::compute<D,T>(const ctranslate2::StorageView &, ctranslate2::dim_t, ctranslate2::dim_t, ctranslate2::dim_t, __nv_bool, ctranslate2::StorageView &) const [with D=ctranslate2::Device::CUDA, T=ctranslate2::bfloat16_t]" at line 68

How does this work on one system, but fail on another... :-/

boegel · 2025-01-06T18:12:34Z

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 9.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 555.42.06, Python 3.9.18
See https://gist.github.com/boegel/0d28f6fd0260c8d30f95355ba130e6b5 for a full test report.

branfosj · 2025-01-06T18:25:54Z

No NVIDIA driver and no GPU available

-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX

NIVIDA driver available but no GPU

-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 5.3;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX

NVIDIA driver and GPU availale

-- Autodetected CUDA architecture(s):  8.0

1 and 2 fail, as per #22119 (comment)

3 builds.

boegel · 2025-01-06T18:39:38Z

It could be what @branfosj mentioned, but I also see this on both the V100 (joltik) and A100 (accelgor) system:

EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.6

That's incorrect, since A100 only supports CUDA compute capability 8.0, while V100 only supports 7.0 (see https://developer.nvidia.com/cuda-gpus).
8.6 does make sense for A2 GPUs (like on our donphan cluster)

I tested myself on our A100 system using 8.0 instead of 8.6, and that works fine.

branfosj · 2025-01-06T18:40:43Z

You need to use something of the form -DCUDA_ARCH_LIST="8.0;8.6;8.6+PTX"

My limited testing suggests that it fails with 7.0 in that list - so all GPUs of V100 and older...

  if(NOT CUDA_ARCH_LIST)
    set(CUDA_ARCH_LIST "Auto")
  elseif(CUDA_ARCH_LIST STREQUAL "Common")
    set(CUDA_ARCH_LIST ${CUDA_COMMON_GPU_ARCHITECTURES})
    # Keep deprecated but not yet dropped Compute Capabilities.
    if(CUDA_VERSION_MAJOR EQUAL 11)
      list(INSERT CUDA_ARCH_LIST 0 "3.5" "5.0")
    endif()
    list(REMOVE_DUPLICATES CUDA_ARCH_LIST)
  endif()

pavelToman · 2025-01-07T13:01:15Z

Test report by @pavelToman
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
node3305.joltik.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, 1 x NVIDIA Tesla V100-SXM2-32GB, 555.42.06, Python 3.9.18
See https://gist.github.com/pavelToman/a8766f9237b0d170a218f1fe66804828 for a full test report.

branfosj · 2025-01-07T13:13:13Z

In CTranslate2 docs there is Use a NVIDIA GPU with Tensor Cores (Compute Capability >= 7.0) so it should work with V100, isn't it?

The error is __nv_bfloat16 and I think those are only supported for CUDA capability >= 8.0.

pavelToman · 2025-01-07T14:28:32Z

I found that the V100 gpu could be the problem: on V100 (sm_70) the CUDA headers do not provide all the bfloat16 arithmetic operators the code is trying to use (in this case operator/=).
I am gonna test it on V100 gpu with CT2_CUDA_ALLOW_BF16=0, if it helps.

pavelToman · 2025-01-07T14:54:00Z

CT2_CUDA_ALLOW_BF16=0 does not help for V100, but downgrade version to 4.3.1 helps. Seems latest (4.4.0, 4.5.0) versions do not support CUDA compute capability = 7.0 anymore.

pavelToman · 2025-01-07T15:10:45Z

It could be what @branfosj mentioned, but I also see this on both the V100 (joltik) and A100 (accelgor) system:
EASYBUILD_CUDA_COMPUTE_CAPABILITIES=8.6
That's incorrect, since A100 only supports CUDA compute capability 8.0, while V100 only supports 7.0 (see https://developer.nvidia.com/cuda-gpus). 8.6 does make sense for A2 GPUs (like on our donphan cluster)

I tested myself on our A100 system using 8.0 instead of 8.6, and that works fine.

Is this something I can fix by add cuda_compute_capabilities = ['8.0', '8.6'] to the easyconfig?

boegel · 2025-01-07T16:09:55Z

CT2_CUDA_ALLOW_BF16=0 does not help for V100, but downgrade version to 4.3.1 helps. Seems latest (4.4.0, 4.5.0) versions do not support CUDA compute capability = 7.0 anymore.

OK, fine, then we can't install this on our V100 system, no problem.

That's incorrect, since A100 only supports CUDA compute capability 8.0, while V100 only supports 7.0 (see https://developer.nvidia.com/cuda-gpus). 8.6 does make sense for A2 GPUs (like on our donphan cluster)
I tested myself on our A100 system using 8.0 instead of 8.6, and that works fine.

Is this something I can fix by add cuda_compute_capabilities = ['8.0', '8.6'] to the easyconfig?

No, people should correctly configure EasyBuild to use the CUDA compute capability that matches their GPU, there should be no system-specific things in the easyconfigs.

boegel · 2025-01-07T16:18:08Z

easybuild/easyconfigs/c/CTranslate2/CTranslate2-4.5.0-foss-2023a-CUDA-12.1.1.eb

+description = "Fast inference engine for Transformer models."
+
+toolchain = {'name': 'foss', 'version': '2023a'}
+toolchainopts = {'optarch': False}


@pavelToman Any specific reason you have this here?

Are there problems when this is not used?

yes, without toolchainopts = {'optarch': False} the build failed.
The log without optarch is there: https://github.com/vscentrum/vsc-software-stack/blob/wip/482_WhisperX/log2.txt

Pavel told me that this is required to dance around a compiler problem, like:

/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type __v8hf __attribute__ ((__vector_size__ (16)));

see also https://forums.developer.nvidia.com/t/including-cub-header-breakes-compilation-with-gcc-12-and-sse2-or-better/255018

Maybe we should consider using a more recent CUDA version here, can you check if that would be feasible @pavelToman ?

Seems the GCC-12.3.0 and CUDA-12.1.1 are incompatible, so I replace CUDA-12.1.1 with version 12.6.0 ant it works!

boegel · 2025-01-07T17:21:01Z

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node4301.litleo.os - Linux RHEL 9.4, x86_64, AMD EPYC 9454P 48-Core Processor (zen4), 1 x NVIDIA NVIDIA H100 NVL, 555.42.06, Python 3.9.18
See https://gist.github.com/boegel/6206884e4a77bbea9e9f3adfb13ad717 for a full test report.

pavelToman · 2025-01-08T15:39:52Z

Replace this PR by version with CUDA-12.6.0:

{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.6.0 #22134

adding easyconfigs: CTranslate2-4.5.0-foss-2023a-CUDA-12.1.1.eb, cpu_…

affba08

…features-0.9.0-GCCcore-12.3.0.eb and patches: CTranslate2-4.5.0_replace-cxxopts.patch, CTranslate2-4.5.0_fix-third-party.patch, CTranslate2-4.5.0_fix-tests.patch

pavelToman added new update labels Jan 6, 2025

pavelToman mentioned this pull request Jan 6, 2025

WhisperX vscentrum/vsc-software-stack#482

Open

2 tasks

Update CTranslate2-4.5.0-foss-2023a-CUDA-12.1.1.eb

eb029f3

fix exts opts

fa95514

pavelToman mentioned this pull request Jan 6, 2025

{ai,math,vis}[foss/2023a] whisper-ctranslate2 v0.5.2, PortAudio v19.7.0, PyAV v11.0.0 w/ CUDA 12.1.1 #22124

Closed

boegel added this to the release after 4.9.4 milestone Jan 6, 2025

boegel reviewed Jan 7, 2025

View reviewed changes

pavelToman closed this Jan 8, 2025

pavelToman mentioned this pull request Jan 8, 2025

{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.6.0 #22134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119

{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119

pavelToman commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025 •

edited

Loading

pavelToman commented Jan 6, 2025

boegelbot commented Jan 6, 2025

pavelToman commented Jan 6, 2025

boegelbot commented Jan 6, 2025

pavelToman commented Jan 6, 2025

boegelbot commented Jan 6, 2025

boegelbot commented Jan 6, 2025

boegel commented Jan 6, 2025

boegel commented Jan 6, 2025

branfosj commented Jan 6, 2025

boegel commented Jan 6, 2025

branfosj commented Jan 6, 2025 •

edited

Loading

pavelToman commented Jan 7, 2025

branfosj commented Jan 7, 2025

pavelToman commented Jan 7, 2025

pavelToman commented Jan 7, 2025 •

edited

Loading

pavelToman commented Jan 7, 2025

boegel commented Jan 7, 2025

boegel Jan 7, 2025

pavelToman Jan 7, 2025 •

edited

Loading

boegel Jan 7, 2025

pavelToman Jan 8, 2025

boegel commented Jan 7, 2025

pavelToman commented Jan 8, 2025

{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119

{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119

Conversation

pavelToman commented Jan 6, 2025 • edited Loading

github-actions bot commented Jan 6, 2025 • edited Loading

Updated software cpu_features-0.9.0-GCCcore-12.3.0.eb

pavelToman commented Jan 6, 2025

boegelbot commented Jan 6, 2025

pavelToman commented Jan 6, 2025

boegelbot commented Jan 6, 2025

pavelToman commented Jan 6, 2025

boegelbot commented Jan 6, 2025

boegelbot commented Jan 6, 2025

boegel commented Jan 6, 2025

boegel commented Jan 6, 2025

branfosj commented Jan 6, 2025

boegel commented Jan 6, 2025

branfosj commented Jan 6, 2025 • edited Loading

pavelToman commented Jan 7, 2025

branfosj commented Jan 7, 2025

pavelToman commented Jan 7, 2025

pavelToman commented Jan 7, 2025 • edited Loading

pavelToman commented Jan 7, 2025

boegel commented Jan 7, 2025

boegel Jan 7, 2025

Choose a reason for hiding this comment

pavelToman Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

boegel Jan 7, 2025

Choose a reason for hiding this comment

pavelToman Jan 8, 2025

Choose a reason for hiding this comment

boegel commented Jan 7, 2025

pavelToman commented Jan 8, 2025

pavelToman commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025 •

edited

Loading

Updated software `cpu_features-0.9.0-GCCcore-12.3.0.eb`

branfosj commented Jan 6, 2025 •

edited

Loading

pavelToman commented Jan 7, 2025 •

edited

Loading

pavelToman Jan 7, 2025 •

edited

Loading