add hipBlas support #94

Cyberhan123 · 2023-11-28T12:33:13Z

This feature is ready to be merged. @leejet
Update: Thanks to leejet's efforts, we can use the master branch directly. I think I need to wait for the following PR to be merged:
ggerganov/ggml#683
ggerganov/ggml#682

# Conflicts: # CMakeLists.txt

Cyberhan123 · 2023-12-30T06:01:07Z

I'm a little confused now:
ggml has removed n_dims from the main branch, but we still have it in our branch, so what should I do because I want to modify ggml.

@leejet @FSSRepo

edit: I mean do we have to follow the master branch and make adjustments?

Cyberhan123 · 2023-12-30T06:33:32Z

This is the performance of AMD 7900XTX on windows

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0
[INFO]  stable-diffusion.cpp:5386 - loading model from './models/miniSD.ckpt'
[INFO]  model.cpp:641  - load ./models/miniSD.ckpt using checkpoint format
[INFO]  stable-diffusion.cpp:5412 - Stable Diffusion 1.x
[INFO]  stable-diffusion.cpp:5418 - Stable Diffusion weight type: f32
[INFO]  stable-diffusion.cpp:5577 - total memory buffer size = 2731.37MB (clip 470.66MB, unet 2165.24MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5579 - loading model from './models/miniSD.ckpt' completed, taking 10.19s
[INFO]  stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO]  stable-diffusion.cpp:6525 - get_learned_condition completed, taking 59 ms
[INFO]  stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 8.84it/s
[INFO]  stable-diffusion.cpp:6551 - sampling completed, taking 2.35s
[INFO]  stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 2.39s
[INFO]  stable-diffusion.cpp:6561 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6571 - latent 1 decoded, taking 0.13s
[INFO]  stable-diffusion.cpp:6575 - decode_first_stage completed, taking 0.13s
[INFO]  stable-diffusion.cpp:6592 - txt2img completed in 2.58s
[INFO]  main.cpp:538  - save result image to 'output.png'

leejet · 2023-12-30T06:36:04Z

I'm a little confused now: ggml has removed n_dims from the main branch, but we still have it in our branch, so what should I do because I want to modify ggml.

@leejet @FSSRepo

edit: I mean do we have to follow the master branch and make adjustments?

I'll attempt to adapt to the latest version of ggml.

ggerganov · 2023-12-30T07:40:20Z

There is ggml_n_dims that can be used to compute the n_dims of a tensor. It's a drop-in replacement in 99% of the cases

# Conflicts: # ggml # stable-diffusion.cpp

Cyberhan123 · 2024-01-10T14:50:20Z

@leejet I think everything is ready, but the worst part is that I don't have an AMD GPU at the moment. I have to wait until I get home this weekend to test it.

Cyberhan123 · 2024-01-10T14:52:41Z

By the way, I have replaced ggml with the latest upstream version. I also hope that friends who use cuda can also test whether there are any problems.

leejet · 2024-01-12T12:58:51Z

Thank you for your contribution. I'll find time to review and merge this PR.

Cyberhan123 · 2024-01-12T13:15:58Z

Thank you for your contribution. I'll find time to review and merge this PR.

Please wait while I test it, I don't have time now

Cyberhan123 · 2024-01-12T17:16:56Z

Now it works fine, when I generate a smaller image (256x256), twenty sampling steps only takes 1.9 seconds

 ./bin/sd -m "../models/miniSD.ckpt" -p "a cat" -H 256 -W 256              
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
[INFO ] stable-diffusion.cpp:137  - loading model from '../models/miniSD.ckpt'
[INFO ] model.cpp:642  - load ../models/miniSD.ckpt using checkpoint format
[INFO ] stable-diffusion.cpp:163  - Stable Diffusion 1.x 
[INFO ] stable-diffusion.cpp:169  - Stable Diffusion weight type: f32
[INFO ] stable-diffusion.cpp:268  - total memory buffer size = 2749.37MB (clip 479.66MB, unet 2165.24MB, vae 104.47MB)
[INFO ] stable-diffusion.cpp:270  - loading model from '../models/miniSD.ckpt' completed, taking 9.26s
[INFO ] stable-diffusion.cpp:284  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:1182 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1221 - get_learned_condition completed, taking 46 ms
[INFO ] stable-diffusion.cpp:1231 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1235 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 12.18it/s
[INFO ] stable-diffusion.cpp:1247 - sampling completed, taking 1.77s
[INFO ] stable-diffusion.cpp:1255 - generating 1 latent images completed, taking 1.80s
[INFO ] stable-diffusion.cpp:1257 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1267 - latent 1 decoded, taking 0.14s
[INFO ] stable-diffusion.cpp:1271 - decode_first_stage completed, taking 0.14s
[INFO ] stable-diffusion.cpp:1290 - txt2img completed in 1.99s
save result image to 'output.png'

There are more features that I haven't tested yet, due to well-known network issues :)
Although I haven't tested other models, I believe they can work properly, so I think this one can be merged.

Cyberhan123 and others added 4 commits November 28, 2023 20:30

add hipBlas support

0c3b14c

Merge branch 'leejet:master' into support-hipblas

dd0fd8f

fix build fail

4c289c8

Merge branch 'master' into support-hipblas

2b84bc3

# Conflicts: # CMakeLists.txt

Cyberhan123 added 2 commits December 30, 2023 14:17

change to latest support logic

89028f3

add full documents for hipBLAS

16cf205

Cyberhan123 added 4 commits January 6, 2024 00:15

Merge branch 'master' into support-hipblas

ba5280e

# Conflicts: # ggml # stable-diffusion.cpp

sync ggml

bf0c9c6

fix cmake broken

f106efe

add cmake macro define for sd

4b05b9d

fix sd_type_t conversion to ggml_type

f4d9e9f

leejet added 2 commits January 14, 2024 11:46

Merge branch 'master' into support-hipblas

041444c

fix gguf support

a76fcd8

leejet merged commit c6071fa into leejet:master Jan 14, 2024
7 checks passed

Cyberhan123 deleted the support-hipblas branch January 14, 2024 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add hipBlas support #94

add hipBlas support #94

Cyberhan123 commented Nov 28, 2023 •

edited

Loading

Cyberhan123 commented Dec 30, 2023 •

edited

Loading

Cyberhan123 commented Dec 30, 2023

leejet commented Dec 30, 2023

ggerganov commented Dec 30, 2023

Cyberhan123 commented Jan 10, 2024

Cyberhan123 commented Jan 10, 2024

leejet commented Jan 12, 2024

Cyberhan123 commented Jan 12, 2024

Cyberhan123 commented Jan 12, 2024 •

edited

Loading

add hipBlas support #94

add hipBlas support #94

Conversation

Cyberhan123 commented Nov 28, 2023 • edited Loading

Cyberhan123 commented Dec 30, 2023 • edited Loading

Cyberhan123 commented Dec 30, 2023

leejet commented Dec 30, 2023

ggerganov commented Dec 30, 2023

Cyberhan123 commented Jan 10, 2024

Cyberhan123 commented Jan 10, 2024

leejet commented Jan 12, 2024

Cyberhan123 commented Jan 12, 2024

Cyberhan123 commented Jan 12, 2024 • edited Loading

Cyberhan123 commented Nov 28, 2023 •

edited

Loading

Cyberhan123 commented Dec 30, 2023 •

edited

Loading

Cyberhan123 commented Jan 12, 2024 •

edited

Loading