Documenting my steps installing/running Intern/Cog on linux with AMD graphics card (6800XT) #165
ProtoBelisarius
started this conversation in
Show and tell
Replies: 1 comment
-
There is an official bitsandbytes port now, which supports ROCm 6.x and should land upstream at some point: https://github.com/ROCm/bitsandbytes/tree/rocm_enabled |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This was written while using commit 5edb6f1.
My system is based on Arch (EndeavourOS) and my hardware is a R9 5900X and 64GB RAM and 6800XT with 16GB VRAM.
I have pretty much all of the ROCm 6.0 packages from the extras repo installed but had to install the "hipblaslt" package from extras as well for bitsandbytes compilation.
Then I did the following steps:
git clone https://github.com/jhc13/taggui
cd taggui/
python3.11 -m venv venv
. venv/bin/activate
pip install -r requirements.txt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7 --force-reinstall
pip uninstall xformers
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6
cd bitsandbytes-rocm-5.6
git fetch
git branch -a
git checkout rocm
rocminfo
make hip ROCM_TARGET=gfx1030
pip install .
python -m bitsandbytes
cd ..
python taggui/run_gui.py
Now internlm-xcomposer2-vl-7b-4bit works.
Credit for the bitsandbytes guide goes to https://llm-tracker.info/howto/AMD-GPUs#bitsandbytes
THUDM/cogvlm-chat-hf in 4bit needs
pip install protobuf==3.20.3
, the workaround with --no-binary=protobuf doesn't work for me. I haven't tried other versions though. It also needs some modification of the inference code downloaded from huggingface as it originally requires xformers. Which currently doesn't compile for AMD.For this you go to
/home/USERNAME/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm-chat-hf/e29dc3ba206d524bf8efbfc60d80fc4556ab0e3c/visual.py
and comment/remove
import xformers.ops as xops
in the fourth line.Then remove line 40-42
out = xops.memory_efficient_attention( q, k, v, scale=self.scale, )
and replace it with
out = self.attention(q, k, v)
save and it should work without error. I'm not sure if this creates hidden problems, as it seems to work fine for me, captioning works well, but it still could have some unforeseen consequences.
Beta Was this translation helpful? Give feedback.
All reactions