-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119
{ai,tools}[GCCcore/12.3.0,foss/2023a] CTranslate2 v4.5.0, cpu_features v0.9.0 w/ CUDA 12.1.1 #22119
Conversation
…features-0.9.0-GCCcore-12.3.0.eb and patches: CTranslate2-4.5.0_replace-cxxopts.patch, CTranslate2-4.5.0_fix-third-party.patch, CTranslate2-4.5.0_fix-tests.patch
Updated software
|
@boegelbot please test @ generoso |
@pavelToman: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2572993551 processed Message to humans: this is just bookkeeping information for me, |
Test report by @pavelToman |
Test report by @boegelbot |
@boegelbot please test @ jsc-zen3-a100 |
@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... - notification for comment with ID 2573056618 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
How does this work on one system, but fail on another... :-/ |
Test report by @boegel |
1 and 2 fail, as per #22119 (comment) 3 builds. |
It could be what @branfosj mentioned, but I also see this on both the V100 (
That's incorrect, since A100 only supports CUDA compute capability 8.0, while V100 only supports 7.0 (see https://developer.nvidia.com/cuda-gpus). I tested myself on our A100 system using |
You need to use something of the form My limited testing suggests that it fails with if(NOT CUDA_ARCH_LIST)
set(CUDA_ARCH_LIST "Auto")
elseif(CUDA_ARCH_LIST STREQUAL "Common")
set(CUDA_ARCH_LIST ${CUDA_COMMON_GPU_ARCHITECTURES})
# Keep deprecated but not yet dropped Compute Capabilities.
if(CUDA_VERSION_MAJOR EQUAL 11)
list(INSERT CUDA_ARCH_LIST 0 "3.5" "5.0")
endif()
list(REMOVE_DUPLICATES CUDA_ARCH_LIST)
endif() |
Test report by @pavelToman |
The error is |
I found that the V100 gpu could be the problem: |
CT2_CUDA_ALLOW_BF16=0 does not help for V100, but downgrade version to 4.3.1 helps. Seems latest (4.4.0, 4.5.0) versions do not support CUDA compute capability = 7.0 anymore. |
Is this something I can fix by add |
OK, fine, then we can't install this on our V100 system, no problem.
No, people should correctly configure EasyBuild to use the CUDA compute capability that matches their GPU, there should be no system-specific things in the easyconfigs. |
description = "Fast inference engine for Transformer models." | ||
|
||
toolchain = {'name': 'foss', 'version': '2023a'} | ||
toolchainopts = {'optarch': False} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavelToman Any specific reason you have this here?
Are there problems when this is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, without toolchainopts = {'optarch': False}
the build failed.
The log without optarch is there: https://github.com/vscentrum/vsc-software-stack/blob/wip/482_WhisperX/log2.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pavel told me that this is required to dance around a compiler problem, like:
/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0/include/avx512fp16intrin.h(38): error: vector_size attribute requires an arithmetic or enum type
__v8hf __attribute__ ((__vector_size__ (16)));
Maybe we should consider using a more recent CUDA version here, can you check if that would be feasible @pavelToman ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the GCC-12.3.0 and CUDA-12.1.1 are incompatible, so I replace CUDA-12.1.1 with version 12.6.0 ant it works!
Test report by @boegel |
Replace this PR by version with CUDA-12.6.0: |
(created using
eb --new-pr
)dependency of WhisperX: vscentrum/vsc-software-stack#482
dependency of whisper-ctranslate2: vscentrum/vsc-software-stack#483
REPLACED BY: #22134