-
-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Update CRLF to LF https://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/ * docs: Update onboarding headlines
- Loading branch information
1 parent
30cfc87
commit 617246b
Showing
2 changed files
with
23 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
* text=auto | ||
*.jpg binary | ||
*.png binary | ||
*.gif binary | ||
|
||
# Git config | ||
# git config core.eol lf | ||
# git config core.autocrlf input |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Hey, welcome to Chipper :wave: | ||
# Welcome to Chipper! :wave: | ||
|
||
**Awesome to have you here!** You might be wondering, "Who or what is Chipper?" Let me give you a quick introduction. | ||
Chipper started as a set of Python scripts designed to experiment with local RAGs (Retrieval-Augmented Generation). Essentially, we process local text files and enhance an AI model with that data. | ||
|
@@ -19,35 +19,35 @@ Chipper essentially provides an end-to-end architecture for experimenting with e | |
|
||
</details> | ||
|
||
## Step 1 - Basic Setup | ||
## Step 1: Setting Up Chipper 🛠️ | ||
|
||
::: info | ||
Everything mentioned here assumes some familiarity with the command line on your system. If you’re using Windows, consider using [MSYS](https://www.msys2.org/) or [WSL](https://learn.microsoft.com/en-us/windows/wsl/about) to make things easier. | ||
::: | ||
|
||
### 1.1 - Docker | ||
### 1.1 Install Docker 🐳 | ||
|
||
Alright, let’s get you set up! There’s one key requirement: [Docker](https://www.docker.com/). Chipper uses Docker to simplify the process and eliminate the need for a complex local setup on your machine. | ||
|
||
- [If you're new to Docker, this will get you started](https://docs.docker.com/get-started/) | ||
|
||
### 1.2 - Git | ||
### 1.2 Install Git 🛡️ | ||
|
||
Secondly, you’ll need Git, a version control tool that’s also the inspiration behind GitHub’s name. If you don’t already have Git installed, no worries: | ||
|
||
- [This guide will help you get started](https://docs.github.com/en/get-started/getting-started-with-git) | ||
|
||
## Step 2 - Project Setup | ||
## Step 2: Getting Started 🚀 | ||
|
||
### 2.1 - Clone | ||
### 2.1 Clone the Repository 📂 | ||
|
||
To get the latest version of Chipper on your system, you’ll need to clone it locally. Simply run the following command: | ||
|
||
```bash | ||
git clone [email protected]:TilmanGriesel/chipper.git | ||
``` | ||
|
||
### 2.2 - Start Chipper | ||
### 2.2 Launch Chipper 🚦 | ||
|
||
Now we’re getting somewhere! Chipper uses [Docker Compose](https://docs.docker.com/compose/) to orchestrate the various components we need to work together, such as ElasticSearch and Chipper services. The best part? You don’t have to do much to get started, Chipper comes with a default configuration ready for experimentation. | ||
|
||
|
@@ -65,25 +65,25 @@ cd chipper | |
|
||
> This step may take some time as [Docker](https://www.docker.com/) downloads all the required resources and compiles Chipper on your system. | ||
## Step 3 - Let's Test | ||
## Step 3: Testing Your Setup ✅ | ||
|
||
Let’s verify that everything is working as expected by importing some test data included with Chipper. During this process, we’ll also pull the embedding model from Ollama if it hasn’t been downloaded yet. | ||
|
||
### 3.1.1 - Embed Testdata | ||
### 3.1 Embed Test Data 📝 | ||
|
||
```bash | ||
./run.sh embed-testdata | ||
``` | ||
|
||
### 3.1.2 - Web Interface | ||
### 3.2 Access the Web Interface 🌐 | ||
|
||
```bash | ||
./run.sh browser | ||
``` | ||
|
||
or open: `http://localhost:21200` | ||
|
||
### 3.1.3 - Send Test-Query | ||
### 3.3 Run a Test Query 🎯 | ||
|
||
```plain | ||
Tell me a story about Chipper, the brilliant golden retriever. | ||
|
@@ -95,11 +95,11 @@ Chipper will now respond using the test data embeddings we set up in the previou | |
You’ll likely see a message like `Starting to download model xy.z...`. Don’t worry, this only happens once for the default model. In the future, I plan to enhance this process with a progress bar or something similar. Once the download is complete, you can reload the page for a smoother experience. | ||
::: | ||
|
||
## Step 4 - Embed your own data | ||
## Step 4: Embedding Your Own Data 📊 | ||
|
||
Congratulations! Now we’re diving into the details. Embeddings are organized into what’s called an `index`, which is essentially a label for a "drawer" where data or embeddings are stored. By default, Chipper uses an index named `default`. While embeddings and the web UI will automatically use this default, you can specify a different one if needed. Just remember, if you switch to another index, you’ll also need to select it in the web UI using the `/index myindex` command. | ||
|
||
### 4.1 - Basic Embedding | ||
### 4.1 Basic Embedding 🏗️ | ||
|
||
```bash | ||
./run.sh embed /my/data/path | ||
|
@@ -110,7 +110,7 @@ We can only embed text data, by default Chipper accepts: | |
> `.txt`, `.md`, `.py`, `.html`, `.js`, `.cpp`, `.hpp`, `.xml` extensions. | ||
> You can change this whitelist by passing your own `--extensions` list. | ||
### 4.1 - Advanced Embedding | ||
### 4.2 Advanced Embedding ⚙️ | ||
|
||
Now we’re ready to experiment! You can explore different splitting configurations to customize how text documents are divided. For example, you can use the `--split-by` argument to specify the method of splitting—options include "word," "sentence," "passage," "page," or "line." Adjust the `--split-length` to define the number of units per split, `--split-overlap` to set the number of units overlapping between splits, or `--split-threshold` to fine-tune the process further. | ||
|
||
|
@@ -122,7 +122,7 @@ You can set the index using the `--es-index <name>` parameter, specify the embed | |
./tools/embed/run.sh --help | ||
``` | ||
|
||
## Step 5 - Where to go from here? | ||
## Step 5: Next Steps and Exploration 🔍 | ||
|
||
First off, if you’ve made it this far, let me unravel the mystery behind why Chipper is called Chipper the Golden Retriever. For starters, I adore golden retrievers! But there’s more to it: they love to "chip" wood, just like we need to split and chip the data we want to embed. And as for "retriever", - well ... | ||
|
||
|