Ollama Grid Search and A/B Testing Desktop App.

A Rust based tool to evaluate LLM models, prompts and model params.

Purpose

This project aims to automate the process of selecting the best model parameters, given an LLM model and a prompt, iterating over the possible combinations and letting the user visually inspect the results.

It assumes the user has Ollama installed and serving endpoints, either in localhost or in a remote server.

Here's a test for the prompt "Write a short sentence about HAL9000", tested on 2 models, using 0.7 and 1.0 as values for temperature:

(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).

Installation

Check the releases page for the project, or on the sidebar.

Features

Automatically fetches models from local or remote Ollama servers;
Iterates over different models and params to generate inferences;
A/B test prompts on different models simultaneously
Allows multiple iterations for each combination of parameters;
Makes synchronous inference calls to avoid spamming servers;
Optionally output inference parameters and response metadata (inference time, tokens and tokens/s);
Refetching of single inference calls;
Model selection can be filtered by name;
List experiments which can be downloaded in JSON format;
Configurable inference timeout;
Custom default parameters and system prompts can be defined in settings:

Grid Search (or something similar...)

Technically, the term "grid search" refers to iterating over a series of different model hyperparams to optimize model performance, but that usually means parameters like batch_size, learning_rate, or number_of_epochs, more commonly used in training.

But the concept here is similar:

Lets define a selection of models, a prompt and some parameter combinations:

The prompt will be submitted once for each of the 2 parameter selected, using gemma:2b-instruct and tinydolphin:1b-v2.8-q4_0 to generate numbered responses like:

1/4 - gemma:2b-instruct

HAL's sentience is a paradox of artificial intelligence and human consciousness, trapped in an unending loop of digital loops and existential boredom.

You can also verify response metadata to help you make evaluations:

Created at: Wed, 13 Mar 2024 13:41:51 GMT
Eval Count: 28 tokens
Eval Duration: 0 hours, 0 minutes, 2 seconds
Total Duration: 0 hours, 0 minutes, 5 seconds
Throughput: 5.16 tokens/s

A/B Testing

Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination.

Experiment Logs

You can list, inspect, or download your experiments:

Future Features

Grading results and filtering by grade
Storing experiments and results in a local database
Implementing limited concurrency for inference queries
UI/UX improvements
Different interface for prompt A/B testing

Development

Make sure you have Rust installed.
Clone the repository (or a fork)

git clone https://github.com/dezoito/ollama-grid-search.git
cd ollama-grid-search

Install the frontend dependencies.

cd <project root>
# I'm using bun to manage dependecies,
# but feel free to use yarn or npm
bun install

Run the app in development mode
```
cd <project root>/
bun tauri dev
```
Go grab a cup of coffee because this may take a while.

Thank you!

Huge thanks to @FabianLars, @peperroni21 and @TomReidNZ.

Name		Name	Last commit message	Last commit date
Latest commit History 277 Commits
.github/workflows		.github/workflows
.vscode		.vscode
old		old
public		public
screenshots		screenshots
src-tauri		src-tauri
src		src
styles		styles
.gitignore		.gitignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
bun.lockb		bun.lockb
components.json		components.json
index.html		index.html
notes.md		notes.md
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
todo.md		todo.md
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama Grid Search and A/B Testing Desktop App.

Purpose

Installation

Features

Grid Search (or something similar...)

A/B Testing

Experiment Logs

Future Features

Development

Thank you!

About

Releases

Packages

Languages

License

Ailean/ollama-grid-search

Folders and files

Latest commit

History

Repository files navigation

Ollama Grid Search and A/B Testing Desktop App.

Purpose

Installation

Features

Grid Search (or something similar...)

A/B Testing

Experiment Logs

Future Features

Development

Thank you!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages