Build against newer GGML version #428

LLukas22 · 2023-09-16T12:33:25Z

Update llama.cpp and include ggml-alloc during binding generation
Switch to new graph-alllocator
Ensure CPU inference works
Ensure Cuda inference works
Ensure OpenCL inference works
Ensure Metal inference works

LLukas22 · 2023-09-16T12:37:25Z

CPU inference seams to work, at least for llama. Cuda/OpenCL are currently broken.

philpax · 2023-09-26T16:55:22Z

As requested:

llm # git log --pretty=oneline
78b0e25c7164cfa9e56cf6ac648e803432d5a0aa (HEAD -> feat/ggml-update, llukas22/feat/ggml-update) Scope `input_length` and `session_len` to `BuildContext`

llm # cargo run -r infer -a llama -m models/llama2/dolphin-llama2-7b.ggmlv3.q4_K_M.bin -p "Testing 123: "
   Compiling ggml-sys v0.2.0-dev (llm/crates/ggml/sys)
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
...
    Finished release [optimized] target(s) in 10.26s
     Running `target/release/llm infer -a llama -m models/llama2/dolphin-llama2-7b.ggmlv3.q4_K_M.bin -p 'Testing 123: '`
✓ Loaded 291 tensors (4.1 GB) after 131ms
zsh: segmentation fault  cargo run -r infer -a llama -m  -p "Testing 123: "

llm # cargo run -F metal -r infer -a llama -m models/llama2/dolphin-llama2-7b.ggmlv3.q4_K_M.bin -p "Testing 123: " --use-gpu
   Compiling ggml-sys v0.2.0-dev (llm/crates/ggml/sys)
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
warning: clang: warning: argument unused during compilation: '-mfpu=neon' [-Wunused-command-line-argument]
   Compiling ggml v0.2.0-dev (llm/crates/ggml)
   Compiling llm-base v0.2.0-dev (llm/crates/llm-base)
error[E0425]: cannot find value `scratch` in this scope
   --> crates/llm-base/src/inference_session.rs:172:28
    |
172 |                 for buf in scratch.iter() {
    |                            ^^^^^^^ not found in this scope

error[E0308]: mismatched types
   --> crates/llm-base/src/inference_session.rs:298:34
    |
298 |                     plan.execute(ctx0);
    |                          ------- ^^^^ expected `&mut Vec<u8>`, found `&mut Context`
    |                          |
    |                          arguments to this method are incorrect
    |
    = note: expected mutable reference `&mut Vec<u8>`
               found mutable reference `&mut ggml::Context`
note: method defined here
   --> llm/crates/ggml/src/lib.rs:355:12
    |
355 |     pub fn execute(&mut self, buffer: &mut Vec<u8>) {
    |            ^^^^^^^

error[E0308]: mismatched types
   --> crates/llm-base/src/inference_session.rs:302:30
    |
302 |                 plan.execute(ctx0);
    |                      ------- ^^^^ expected `&mut Vec<u8>`, found `&mut Context`
    |                      |
    |                      arguments to this method are incorrect
    |
    = note: expected mutable reference `&mut Vec<u8>`
               found mutable reference `&mut ggml::Context`
note: method defined here
   --> llm/crates/ggml/src/lib.rs:355:12
    |
355 |     pub fn execute(&mut self, buffer: &mut Vec<u8>) {
    |            ^^^^^^^

Some errors have detailed explanations: E0308, E0425.
For more information about an error, try `rustc --explain E0308`.
error: could not compile `llm-base` (lib) due to 3 previous errors

LLukas22 · 2023-09-27T17:55:14Z

Thanks for the quick test, i don't think i can debug that as i can't reproduce it. The failure with metal enabled is expected as i don't have touched any of the accelerators yet. I think i will focus on getting Cuda and OpenCL functional again.

philpax · 2023-09-27T19:52:40Z

Sounds good to me - I'll see if I can make some time to push the macOS effort forward, but I suspect I'm going to be really busy for the next week or two 😓

Awesome work so far, though! Let me know if you need me to look at/consult on anything, but everything looks great so far.

philpax · 2023-11-01T00:17:41Z

The test I did earlier on macOS now works after pulling in your latest changes. I'll fix Metal soon - I'm hoping that it shouldn't be too bad. After that, I'll check each architecture with {CPU, CUDA, Metal}. Maybe OpenCL, but that's honestly kind of a pain to test - might just bounce it against the CI and hope for the best.

I may also try to use this PR to update to the latest version once more, but it'll depend on how large that change is. I want to keep the diff small so that I can get this in before #412 and not have to resolve too many merge conflicts - we'll see how we go!

LLukas22 · 2023-11-02T08:09:00Z

I think i stopped implementing this as i got a lot of "memory access errors" while i tried to get cuda working. I guess CPU inference should work, but i kind of gave up on getting CUDA o work as its just very difficult to debug into GGML from the rust side. Especially the CUDA bits.

philpax · 2023-11-02T08:17:11Z

Yeah, understandable - it’s very difficult to debug. I’m out of town for the next few days, but I’ll get back to it after that.

LLukas22 · 2023-11-02T08:26:12Z

Take your time. I will probably focus on getting the quanitzed cuda kernels working in candle over the weekend.

LLukas22 added 2 commits September 16, 2023 14:29

Build a against newer GGML version

6835335

Update llama-cpp

1eb0d79

LLukas22 added 7 commits September 16, 2023 14:57

Include ggml-alloc.c during build

ad136e1

Merge remote-tracking branch 'upstream/main' into feat/ggml-update

fd3ff64

Hopefully fix linux build

ab381c7

Remove Scratch Buffers

4ebb16e

Use GraphAllocator in LLaMA architecture

995dd79

Working graph allocator for llama

6ba5126

Scope input_length and session_len to BuildContext

78b0e25

LLukas22 mentioned this pull request Sep 27, 2023

How to disable ggml logging? #433

Open

LLukas22 added 2 commits September 30, 2023 11:12

Logging + mpt tests

8ad589b

Try to set the cuda scratch offset

e506b0b

philpax mentioned this pull request Oct 31, 2023

add bert model #398

Merged

fix(ggml): bindgen issues

fcbfb4d

philpax mentioned this pull request Nov 4, 2023

Why is the feed_prompt process so slow? #439

Open

Merge branch 'develop' into feat/ggml-update

5e4b35f

philpax changed the base branch from main to develop November 12, 2023 20:35

philpax marked this pull request as ready for review November 12, 2023 20:35

philpax merged commit e5e0fe1 into rustformers:develop Nov 12, 2023
10 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build against newer GGML version #428

Build against newer GGML version #428

LLukas22 commented Sep 16, 2023 •

edited

Loading

LLukas22 commented Sep 16, 2023

philpax commented Sep 26, 2023

LLukas22 commented Sep 27, 2023

philpax commented Sep 27, 2023

philpax commented Nov 1, 2023

LLukas22 commented Nov 2, 2023

philpax commented Nov 2, 2023

LLukas22 commented Nov 2, 2023

Build against newer GGML version #428

Build against newer GGML version #428

Conversation

LLukas22 commented Sep 16, 2023 • edited Loading

LLukas22 commented Sep 16, 2023

philpax commented Sep 26, 2023

LLukas22 commented Sep 27, 2023

philpax commented Sep 27, 2023

philpax commented Nov 1, 2023

LLukas22 commented Nov 2, 2023

philpax commented Nov 2, 2023

LLukas22 commented Nov 2, 2023

LLukas22 commented Sep 16, 2023 •

edited

Loading