Skip to content

Releases: OpenBMB/llama.cpp

b4466

13 Jan 04:04
924518e
Compare
Choose a tag to compare
Reset color before we exit (#11205)

We don't want colors to leak post termination of llama-run.

Signed-off-by: Eric Curtin <[email protected]>

b4263

04 Dec 10:44
253b7fd
Compare
Choose a tag to compare
Fix HF repo commit to clone lora test models (#10649)

b4049

08 Nov 07:48
76c6e7f
Compare
Choose a tag to compare
server : minor UI fix (#10207)

b3923

16 Oct 09:20
becfd38
Compare
Choose a tag to compare
[CANN] Fix cann compilation error (#9891)

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

b3918

14 Oct 09:31
Compare
Choose a tag to compare
fix memory leaks in minicpmv

b3917

14 Oct 09:00
a89f75e
Compare
Choose a tag to compare
server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

b3916

14 Oct 07:31
13dca2a
Compare
Choose a tag to compare
Vectorize load instructions in dmmv f16 CUDA kernel (#9816)

* Vectorize load instructions in dmmv f16 CUDA kernel

Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.

* addressed comment

* Update ggml/src/ggml-cuda/dmmv.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b3899

09 Oct 08:24
dca1d4b
Compare
Choose a tag to compare
ggml : fix BLAS with unsupported types (#9775)

* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it

b3848

30 Sep 08:26
c919d5d
Compare
Choose a tag to compare
ggml : define missing HWCAP flags (#9684)

ggml-ci

Co-authored-by: Willy Tarreau <[email protected]>

b3669

05 Sep 15:16
4db0478
Compare
Choose a tag to compare
cuda : fix defrag with quantized KV (#9319)