-
Notifications
You must be signed in to change notification settings - Fork 369
Conversation
CPU inference seams to work, at least for llama. Cuda/OpenCL are currently broken. |
As requested:
|
Thanks for the quick test, i don't think i can debug that as i can't reproduce it. The failure with metal enabled is expected as i don't have touched any of the accelerators yet. I think i will focus on getting Cuda and OpenCL functional again. |
Sounds good to me - I'll see if I can make some time to push the macOS effort forward, but I suspect I'm going to be really busy for the next week or two 😓 Awesome work so far, though! Let me know if you need me to look at/consult on anything, but everything looks great so far. |
The test I did earlier on macOS now works after pulling in your latest changes. I'll fix Metal soon - I'm hoping that it shouldn't be too bad. After that, I'll check each architecture with {CPU, CUDA, Metal}. Maybe OpenCL, but that's honestly kind of a pain to test - might just bounce it against the CI and hope for the best. I may also try to use this PR to update to the latest version once more, but it'll depend on how large that change is. I want to keep the diff small so that I can get this in before #412 and not have to resolve too many merge conflicts - we'll see how we go! |
I think i stopped implementing this as i got a lot of "memory access errors" while i tried to get cuda working. I guess CPU inference should work, but i kind of gave up on getting CUDA o work as its just very difficult to debug into GGML from the rust side. Especially the CUDA bits. |
Yeah, understandable - it’s very difficult to debug. I’m out of town for the next few days, but I’ll get back to it after that. |
Take your time. I will probably focus on getting the quanitzed cuda kernels working in candle over the weekend. |
llama.cpp
and includeggml-alloc
during binding generation