Matcha Chat 2 is currently under HEAVY development! Everything is under development. Expect bugs and errors.
See old version: MatchaChat 1 Branch.
Thank you for your support.
Matcha Chat is a GUI chat app wrapping llama.cpp for Windows OS designed to chat with a local language model, built with a Python and Pyside6.
The app interface allows for easy installation of llama.cpp, some models from TheBloke and visual ability from llava (WIP as for version 2), with message sending, system configuration, whisper voice input and management of character cards.
- [✔] Easy-to-use installation
- [✔] Character management
- [✔] Hardware acceration support: clBlast for CPU/GPU hybrid inference.
- [✔] Native GUI: Not running in a browser or Electron.
- Vision ability: WIP.
- Voice input:Speak your own language in your voice with auto translation to English, WIP.
- [✔] Built-in translator
- [✔] Wrapped LongLM support
- [-] Long-term memory: currently working with highest priority.
❤️ All data stores and computes on your local machine, powered by multiple machine learning models.
Hardware requirements
Devices with 2/4/8GB of installed RAM can run in text-only mode for a quantized 1B/3B/7B model.
You need an additional 4GB of spare RAM to chat with images.
Really much VRAM is required to use GPU acceleration. Set gpu_layers to 0 unless your PC is strong.
Download the built binary executable file for x64 Windows OS from Release, and place it in any empty folder.
Download the llama.cpp backend, and your favorite models.
Viola!
Click here to expand
Before running Matcha Chat, ensure you have Python 3.10 and all requirements installed.
pip install -r requirements.txt
python matcha_gui.py
# or py matcha-gui.py if you are using Powershell
If you like to~
nuitka --standalone --show-progress --onefile --disable-console --plugin-enable=pyside6 --windows-icon-from-ico=.\icon.ico --output-dir=build_output main_window.py
Q: How to manually add .gguf model?
A: Place model files into ./models folder, and go to settings in MatchaChat to apply.
Q: Is larger model better?
A: No, it depends on your task, and the model itself. The LLM field is advancing so rapidly that it only takes days for the list of "best" models to refresh. However, note that leader boards are not entirely trustworthy, and many test ratings are distorted.
Q: The model outputs too much/just gibberish. A: Be aware that the preferred length of generation is also differnt across models, which is also an important part of the chat experience. If your model generate just gibberish, try to change parameters, or modify your first message to inducing a constant chat atmosphere.
Q: There are so many parameters that I don't understand how to set them!
A: Hover your mouse over the slider of the slider bar to see the tips.
Q: How to contribute?
A: Fork and modify the code, raise an issue, or just kiss Setsuna goodnight.
CPU: Intel Core m3-6Y30 (2C4T)
GPU: Intel HD Graphics 515
Memory: Dual-Channel 2x2 GB