Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hipBlas support #94

Merged
merged 13 commits into from
Jan 14, 2024
Merged

add hipBlas support #94

merged 13 commits into from
Jan 14, 2024

Conversation

Cyberhan123
Copy link
Contributor

@Cyberhan123 Cyberhan123 commented Nov 28, 2023

This feature is ready to be merged. @leejet
Update: Thanks to leejet's efforts, we can use the master branch directly. I think I need to wait for the following PR to be merged:
ggerganov/ggml#683
ggerganov/ggml#682

@Cyberhan123
Copy link
Contributor Author

Cyberhan123 commented Dec 30, 2023

I'm a little confused now:
ggml has removed n_dims from the main branch, but we still have it in our branch, so what should I do because I want to modify ggml.

@leejet @FSSRepo

edit: I mean do we have to follow the master branch and make adjustments?

@Cyberhan123
Copy link
Contributor Author

This is the performance of AMD 7900XTX on windows

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0
[INFO]  stable-diffusion.cpp:5386 - loading model from './models/miniSD.ckpt'
[INFO]  model.cpp:641  - load ./models/miniSD.ckpt using checkpoint format
[INFO]  stable-diffusion.cpp:5412 - Stable Diffusion 1.x
[INFO]  stable-diffusion.cpp:5418 - Stable Diffusion weight type: f32
[INFO]  stable-diffusion.cpp:5577 - total memory buffer size = 2731.37MB (clip 470.66MB, unet 2165.24MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5579 - loading model from './models/miniSD.ckpt' completed, taking 10.19s
[INFO]  stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO]  stable-diffusion.cpp:6525 - get_learned_condition completed, taking 59 ms
[INFO]  stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 8.84it/s
[INFO]  stable-diffusion.cpp:6551 - sampling completed, taking 2.35s
[INFO]  stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 2.39s
[INFO]  stable-diffusion.cpp:6561 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6571 - latent 1 decoded, taking 0.13s
[INFO]  stable-diffusion.cpp:6575 - decode_first_stage completed, taking 0.13s
[INFO]  stable-diffusion.cpp:6592 - txt2img completed in 2.58s
[INFO]  main.cpp:538  - save result image to 'output.png'

@leejet
Copy link
Owner

leejet commented Dec 30, 2023

I'm a little confused now: ggml has removed n_dims from the main branch, but we still have it in our branch, so what should I do because I want to modify ggml.

@leejet @FSSRepo

edit: I mean do we have to follow the master branch and make adjustments?

I'll attempt to adapt to the latest version of ggml.

@ggerganov
Copy link
Contributor

There is ggml_n_dims that can be used to compute the n_dims of a tensor. It's a drop-in replacement in 99% of the cases

@Cyberhan123
Copy link
Contributor Author

@leejet I think everything is ready, but the worst part is that I don't have an AMD GPU at the moment. I have to wait until I get home this weekend to test it.

@Cyberhan123
Copy link
Contributor Author

By the way, I have replaced ggml with the latest upstream version. I also hope that friends who use cuda can also test whether there are any problems.

@leejet
Copy link
Owner

leejet commented Jan 12, 2024

Thank you for your contribution. I'll find time to review and merge this PR.

@Cyberhan123
Copy link
Contributor Author

Thank you for your contribution. I'll find time to review and merge this PR.

Please wait while I test it, I don't have time now

@Cyberhan123
Copy link
Contributor Author

Cyberhan123 commented Jan 12, 2024

Now it works fine, when I generate a smaller image (256x256), twenty sampling steps only takes 1.9 seconds

 ./bin/sd -m "../models/miniSD.ckpt" -p "a cat" -H 256 -W 256              
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
[INFO ] stable-diffusion.cpp:137  - loading model from '../models/miniSD.ckpt'
[INFO ] model.cpp:642  - load ../models/miniSD.ckpt using checkpoint format
[INFO ] stable-diffusion.cpp:163  - Stable Diffusion 1.x 
[INFO ] stable-diffusion.cpp:169  - Stable Diffusion weight type: f32
[INFO ] stable-diffusion.cpp:268  - total memory buffer size = 2749.37MB (clip 479.66MB, unet 2165.24MB, vae 104.47MB)
[INFO ] stable-diffusion.cpp:270  - loading model from '../models/miniSD.ckpt' completed, taking 9.26s
[INFO ] stable-diffusion.cpp:284  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:1182 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1221 - get_learned_condition completed, taking 46 ms
[INFO ] stable-diffusion.cpp:1231 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1235 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 12.18it/s
[INFO ] stable-diffusion.cpp:1247 - sampling completed, taking 1.77s
[INFO ] stable-diffusion.cpp:1255 - generating 1 latent images completed, taking 1.80s
[INFO ] stable-diffusion.cpp:1257 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1267 - latent 1 decoded, taking 0.14s
[INFO ] stable-diffusion.cpp:1271 - decode_first_stage completed, taking 0.14s
[INFO ] stable-diffusion.cpp:1290 - txt2img completed in 1.99s
save result image to 'output.png'

There are more features that I haven't tested yet, due to well-known network issues :)
Although I haven't tested other models, I believe they can work properly, so I think this one can be merged.

@leejet leejet merged commit c6071fa into leejet:master Jan 14, 2024
7 checks passed
@Cyberhan123 Cyberhan123 deleted the support-hipblas branch January 14, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants