Skip to content

Commit

Permalink
Improve documentation (#21)
Browse files Browse the repository at this point in the history
* Update CRLF to LF

https://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/

* docs: Update onboarding headlines
  • Loading branch information
TilmanGriesel authored Jan 3, 2025
1 parent 30cfc87 commit 617246b
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 15 deletions.
8 changes: 8 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
* text=auto
*.jpg binary
*.png binary
*.gif binary

# Git config
# git config core.eol lf
# git config core.autocrlf input
30 changes: 15 additions & 15 deletions docs/get-started.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Hey, welcome to Chipper :wave:
# Welcome to Chipper! :wave:

**Awesome to have you here!** You might be wondering, "Who or what is Chipper?" Let me give you a quick introduction.
Chipper started as a set of Python scripts designed to experiment with local RAGs (Retrieval-Augmented Generation). Essentially, we process local text files and enhance an AI model with that data.
Expand All @@ -19,35 +19,35 @@ Chipper essentially provides an end-to-end architecture for experimenting with e

</details>

## Step 1 - Basic Setup
## Step 1: Setting Up Chipper 🛠️

::: info
Everything mentioned here assumes some familiarity with the command line on your system. If you’re using Windows, consider using [MSYS](https://www.msys2.org/) or [WSL](https://learn.microsoft.com/en-us/windows/wsl/about) to make things easier.
:::

### 1.1 - Docker
### 1.1 Install Docker 🐳

Alright, let’s get you set up! There’s one key requirement: [Docker](https://www.docker.com/). Chipper uses Docker to simplify the process and eliminate the need for a complex local setup on your machine.

- [If you're new to Docker, this will get you started](https://docs.docker.com/get-started/)

### 1.2 - Git
### 1.2 Install Git 🛡️

Secondly, you’ll need Git, a version control tool that’s also the inspiration behind GitHub’s name. If you don’t already have Git installed, no worries:

- [This guide will help you get started](https://docs.github.com/en/get-started/getting-started-with-git)

## Step 2 - Project Setup
## Step 2: Getting Started 🚀

### 2.1 - Clone
### 2.1 Clone the Repository 📂

To get the latest version of Chipper on your system, you’ll need to clone it locally. Simply run the following command:

```bash
git clone [email protected]:TilmanGriesel/chipper.git
```

### 2.2 - Start Chipper
### 2.2 Launch Chipper 🚦

Now we’re getting somewhere! Chipper uses [Docker Compose](https://docs.docker.com/compose/) to orchestrate the various components we need to work together, such as ElasticSearch and Chipper services. The best part? You don’t have to do much to get started, Chipper comes with a default configuration ready for experimentation.

Expand All @@ -65,25 +65,25 @@ cd chipper

> This step may take some time as [Docker](https://www.docker.com/) downloads all the required resources and compiles Chipper on your system.
## Step 3 - Let's Test
## Step 3: Testing Your Setup ✅

Let’s verify that everything is working as expected by importing some test data included with Chipper. During this process, we’ll also pull the embedding model from Ollama if it hasn’t been downloaded yet.

### 3.1.1 - Embed Testdata
### 3.1 Embed Test Data 📝

```bash
./run.sh embed-testdata
```

### 3.1.2 - Web Interface
### 3.2 Access the Web Interface 🌐

```bash
./run.sh browser
```

or open: `http://localhost:21200`

### 3.1.3 - Send Test-Query
### 3.3 Run a Test Query 🎯

```plain
Tell me a story about Chipper, the brilliant golden retriever.
Expand All @@ -95,11 +95,11 @@ Chipper will now respond using the test data embeddings we set up in the previou
You’ll likely see a message like `Starting to download model xy.z...`. Don’t worry, this only happens once for the default model. In the future, I plan to enhance this process with a progress bar or something similar. Once the download is complete, you can reload the page for a smoother experience.
:::

## Step 4 - Embed your own data
## Step 4: Embedding Your Own Data 📊

Congratulations! Now we’re diving into the details. Embeddings are organized into what’s called an `index`, which is essentially a label for a "drawer" where data or embeddings are stored. By default, Chipper uses an index named `default`. While embeddings and the web UI will automatically use this default, you can specify a different one if needed. Just remember, if you switch to another index, you’ll also need to select it in the web UI using the `/index myindex` command.

### 4.1 - Basic Embedding
### 4.1 Basic Embedding 🏗️

```bash
./run.sh embed /my/data/path
Expand All @@ -110,7 +110,7 @@ We can only embed text data, by default Chipper accepts:
> `.txt`, `.md`, `.py`, `.html`, `.js`, `.cpp`, `.hpp`, `.xml` extensions.
> You can change this whitelist by passing your own `--extensions` list.
### 4.1 - Advanced Embedding
### 4.2 Advanced Embedding ⚙️

Now we’re ready to experiment! You can explore different splitting configurations to customize how text documents are divided. For example, you can use the `--split-by` argument to specify the method of splitting—options include "word," "sentence," "passage," "page," or "line." Adjust the `--split-length` to define the number of units per split, `--split-overlap` to set the number of units overlapping between splits, or `--split-threshold` to fine-tune the process further.

Expand All @@ -122,7 +122,7 @@ You can set the index using the `--es-index <name>` parameter, specify the embed
./tools/embed/run.sh --help
```

## Step 5 - Where to go from here?
## Step 5: Next Steps and Exploration 🔍

First off, if you’ve made it this far, let me unravel the mystery behind why Chipper is called Chipper the Golden Retriever. For starters, I adore golden retrievers! But there’s more to it: they love to "chip" wood, just like we need to split and chip the data we want to embed. And as for "retriever", - well ...

Expand Down

0 comments on commit 617246b

Please sign in to comment.