Documentation: Add llamacpp conf example

This patch adds configuraion example of llamacpp. See ml_inference_offloading/src/main/assets/models/README to download model. Signed-off-by: Yelin Jeong <[email protected]>
nnstreamer · Dec 20, 2024 · 6b9f4d5 · 6b9f4d5
1 parent 74de764
commit 6b9f4d5
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 0 deletions.
diff --git a/documentation/example_conf/llamacpp.conf b/documentation/example_conf/llamacpp.conf
@@ -0,0 +1,17 @@
+{
+    "single" :
+    {
+        "framework" : "llamacpp",
+        "model" : ["rocket-3b.Q4_0.gguf"],
+        "input_info" : [
+          {
+            "format" : "flexible"
+          }
+        ],
+        "output_info" : [
+          {
+            "format" : "flexible"
+          }
+        ]
+    }
+}
diff --git a/ml_inference_offloading/src/main/assets/models/README.md b/ml_inference_offloading/src/main/assets/models/README.md
@@ -4,6 +4,12 @@
 
 ### yolov8s_float32
 
+### llamacpp
+
+To run llamacpp model, copy gguf file into this directory.
+You can download small size LLM gguf model [here](https://huggingface.co/TheBloke/rocket-3B-GGUF).
+To enable optimized GEMM/GEMV kernels use Q4_0 to Q4_0_x_x from [prebuilt libraries](https://github.com/nnstreamer/nnstreamer-android-resource).
+
 ### llama2c
 
 To run llama2c model, copy model.bin and tokenizer.bin file into this directory.