-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm support #252
base: main
Are you sure you want to change the base?
ROCm support #252
Conversation
Oh I see, you wrote a Dockerfile! We have no way to test it, because we have no AMD gpus, but maybe we can set up the building process and someone can test.
@88Ocelot does this mean it will install deepspeed at a first launch, and there's no way to install it in Dockerfile currently? That's a super nice contribution @88Ocelot ! |
&& python -m self_hosting_machinery.watchdog.docker_watchdog' | ||
image: refact_self_hosting_rocm | ||
build: | ||
dockerfile: rocm.Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build:
+ context: .
dockerfile: rocm.Dockerfile
This was the only issue i found with this build so far :D I am testing it right now, just waiting for the models to download
After some building and testing i have ecountered a big issue
refact_self_hosted_1 | -- 11 -- 20240102 00:08:39 MODEL STATUS loading model
refact_self_hosted_1 | -- 11 -- 20240102 00:08:39 MODEL loading model local_files_only=1
refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL Exllama kernel is not installed, reset disable_exllama to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source.
refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:
refact_self_hosted_1 | -- 11 -- 1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
refact_self_hosted_1 | -- 11 -- 2. You are using pytorch without CUDA support.
refact_self_hosted_1 | -- 11 -- 3. CUDA and nvcc are not installed in your device.
refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL lm_head not been quantized, will be ignored when make_quant.
refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL CUDA extension not installed.
After some testing today i can say that sadly we need to wait more to make this happen . For example flash_attention probably going to work from rocm5.7 when it gets stable release.I saw that you have tried some workarounds, but i believe it did not worked due to rocm library differences
So far even when it builded and started most of the time i just got timeout error , and model was not loaded properly.
Initial support for ROCm