Skip to content

Commit

Permalink
Documentation: Add llamacpp conf example
Browse files Browse the repository at this point in the history
This patch adds configuraion example of llamacpp.
See ml_inference_offloading/src/main/assets/models/README to download model.

Signed-off-by: Yelin Jeong <[email protected]>
  • Loading branch information
niley7464 authored and jaeyun-jung committed Dec 20, 2024
1 parent 74de764 commit 6b9f4d5
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
17 changes: 17 additions & 0 deletions documentation/example_conf/llamacpp.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"single" :
{
"framework" : "llamacpp",
"model" : ["rocket-3b.Q4_0.gguf"],
"input_info" : [
{
"format" : "flexible"
}
],
"output_info" : [
{
"format" : "flexible"
}
]
}
}
6 changes: 6 additions & 0 deletions ml_inference_offloading/src/main/assets/models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@

### yolov8s_float32

### llamacpp

To run llamacpp model, copy gguf file into this directory.
You can download small size LLM gguf model [here](https://huggingface.co/TheBloke/rocket-3B-GGUF).
To enable optimized GEMM/GEMV kernels use Q4_0 to Q4_0_x_x from [prebuilt libraries](https://github.com/nnstreamer/nnstreamer-android-resource).

### llama2c

To run llama2c model, copy model.bin and tokenizer.bin file into this directory.
Expand Down

0 comments on commit 6b9f4d5

Please sign in to comment.