Releases · OpenBMB/llama.cpp

13 Jan 04:04

924518e

b4466 Latest

Latest

Reset color before we exit (#11205)

We don't want colors to leak post termination of llama-run.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-13T04:04:24Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-13T04:04:30Z
llama-b4466-bin-macos-arm64.zip

12.6 MB 2025-01-13T04:04:38Z
llama-b4466-bin-macos-x64.zip

13.6 MB 2025-01-13T04:04:39Z
llama-b4466-bin-ubuntu-x64.zip

15.5 MB 2025-01-13T04:04:40Z
llama-b4466-bin-win-avx-x64.zip

9.82 MB 2025-01-13T04:04:40Z
llama-b4466-bin-win-avx2-x64.zip

9.82 MB 2025-01-13T04:04:41Z
llama-b4466-bin-win-avx512-x64.zip

9.84 MB 2025-01-13T04:04:41Z
llama-b4466-bin-win-cuda-cu11.7-x64.zip

147 MB 2025-01-13T04:04:42Z
llama-b4466-bin-win-cuda-cu12.4-x64.zip

147 MB 2025-01-13T04:04:46Z
Source code (zip)

2025-01-12T18:23:10Z
Source code (tar.gz)

2025-01-12T18:23:10Z

04 Dec 10:44

github-actions

b4263

253b7fd

b4263

Fix HF repo commit to clone lora test models (#10649)

Assets 22

08 Nov 07:48

github-actions

b4049

76c6e7f

b4049

server : minor UI fix (#10207)

Assets 22

16 Oct 09:20

github-actions

b3923

becfd38

b3923

[CANN] Fix cann compilation error (#9891)

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

Assets 22

14 Oct 09:31

github-actions

b3918

ccc7bb7

b3918

fix memory leaks in minicpmv

Assets 22

14 Oct 09:00

github-actions

b3917

a89f75e

b3917

server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

Assets 22

14 Oct 07:31

github-actions

b3916

13dca2a

b3916

Vectorize load instructions in dmmv f16 CUDA kernel (#9816)

* Vectorize load instructions in dmmv f16 CUDA kernel

Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.

* addressed comment

* Update ggml/src/ggml-cuda/dmmv.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 22

09 Oct 08:24

github-actions

b3899

dca1d4b

b3899

ggml : fix BLAS with unsupported types (#9775)

* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it

Assets 22

30 Sep 08:26

github-actions

b3848

c919d5d

b3848

ggml : define missing HWCAP flags (#9684)

ggml-ci

Co-authored-by: Willy Tarreau <[email protected]>

Assets 22

05 Sep 15:16

github-actions

b3669

4db0478

b3669

cuda : fix defrag with quantized KV (#9319)

Assets 19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: OpenBMB/llama.cpp

b4466

b4263

b4049

b3923

b3918

b3917

b3916

b3899

b3848

b3669