-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: Run Self-hosted version on CPU #191
Comments
I'd like to mention that if you handle |
I would be interested in having an option to run on a CPU too, as an addition to the GPU, just to maximise the benefits I get from the GPUs I have available. For example, running starcoder 7B on my GPU for code completion and llama 7B on the CPU for chat functionality in the VSCode plugin. Right now if I want to have both functionalities I have to resort to using the smallest models to make sure they fit in my GFX card's VRAM. |
Hi @octopusx |
@olegklimov for sure I don't want to run anything on the CPU if I can avoid it, and especially not the code completion part. I was only thinking of moving the chat function to the CPU to free up my GPU to do higher quality code completion. Currently I run llama.cpp on CPU for chat-based openAPI integrations with a llama 2 7b chat model (https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_K_M.gguf) and on a Ryzen 3000 series CPUs I am getting close to instant chat responses. The key issue I have with this setup is that, for example, I can not point my refact plugin to the llama.cpp endpoint for chat, and I can not point the other chat integrations to the self hosted refact, so I am having to host 2 solutions at the same time basically... |
Ah I see, that makes total sense. I think the best way to solve this is to add providers to the rust layer, for the new plugins. We'll release the plugins "as is" this week, because we need to release it and start getting feedback. Then ~next week we'll add the concept of providers to the rust layer. You'll be able hopefully to direct requests to your llama.cpp server. |
This is amazing, I will be on the lookout for the new releases and test this as soon as it's available. |
No description provided.
The text was updated successfully, but these errors were encountered: