-
Notifications
You must be signed in to change notification settings - Fork 240
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added llama-index integration (#155)
* Added llama-index integration * Updated following code review
- Loading branch information
Showing
10 changed files
with
717 additions
and
2 deletions.
There are no files selected for viewing
239 changes: 239 additions & 0 deletions
239
apps/opik-documentation/documentation/docs/cookbook/llama-index.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,239 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Using Opik with LlamaIndex\n", | ||
"\n", | ||
"This notebook showcases how to use Opik with LlamaIndex. [LlamaIndex](https://github.com/run-llama/llama_index) is a flexible data framework for building LLM applications:\n", | ||
"> LlamaIndex is a \"data framework\" to help you build LLM apps. It provides the following tools:\n", | ||
">\n", | ||
"> - Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).\n", | ||
"> - Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.\n", | ||
"> - Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.\n", | ||
"> - Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).\n", | ||
"\n", | ||
"For this guide we will be downloading the essays from Paul Graham and use them as our data source. We will then start querying these essays with LlamaIndex." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Creating an account on Comet.com\n", | ||
"\n", | ||
"[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n", | ||
"\n", | ||
"> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import getpass\n", | ||
"\n", | ||
"os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n", | ||
"os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"If you are running the Opik platform locally, simply set:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# import os\n", | ||
"# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Preparing our environment\n", | ||
"\n", | ||
"First, we will install the necessary libraries, download the Chinook database and set up our different API keys." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install opik llama-index llama-index-agent-openai llama-index-llms-openai --quiet" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"And configure the required environment variables:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import getpass\n", | ||
"\n", | ||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In addition, we will download the Paul Graham essays:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import requests\n", | ||
"\n", | ||
"# Create directory if it doesn't exist\n", | ||
"os.makedirs('./data/paul_graham/', exist_ok=True)\n", | ||
"\n", | ||
"# Download the file using requests\n", | ||
"url = 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt'\n", | ||
"response = requests.get(url)\n", | ||
"with open('./data/paul_graham/paul_graham_essay.txt', 'wb') as f:\n", | ||
" f.write(response.content)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Using LlamaIndex\n", | ||
"\n", | ||
"### Configuring the Opik integration" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"You can use the Opik callback directly by calling:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from llama_index.core import Settings\n", | ||
"from llama_index.core.callbacks import CallbackManager\n", | ||
"from opik.integrations.llama_index import LlamaIndexCallbackHandler\n", | ||
"\n", | ||
"opik_callback_handler = LlamaIndexCallbackHandler()\n", | ||
"Settings.callback_manager = CallbackManager([opik_callback_handler])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now that the callback handler is configured, all traces will automatically be logged to Opik.\n", | ||
"\n", | ||
"### Using LLamaIndex" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The first step is to load the data into LlamaIndex. We will use the `SimpleDirectoryReader` to load the data from the `data/paul_graham` directory. We will also create the vector store to index all the loaded documents." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n", | ||
"\n", | ||
"documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n", | ||
"index = VectorStoreIndex.from_documents(documents)\n", | ||
"query_engine = index.as_query_engine()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can now query the index using the `query_engine` object:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"The author worked on writing short stories and programming, starting with early attempts on an IBM 1401 in 9th grade, using an early version of Fortran. Later, the author transitioned to working with microcomputers, building a TRS-80 and writing simple games and programs. Despite enjoying programming, the author initially planned to study philosophy in college but eventually switched to AI due to a lack of interest in philosophy courses.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"response = query_engine.query(\"What did the author do growing up?\")\n", | ||
"print(response)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"You can now go to the Opik app to see the trace:\n", | ||
"\n", | ||
"![LlamaIndex trace in Opik](/img/cookbook/llamaIndex_cookbook.png)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "py312_llm_eval", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
116 changes: 116 additions & 0 deletions
116
apps/opik-documentation/documentation/docs/cookbook/llama-index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# Using Opik with LlamaIndex | ||
|
||
This notebook showcases how to use Opik with LlamaIndex. [LlamaIndex](https://github.com/run-llama/llama_index) is a flexible data framework for building LLM applications: | ||
> LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools: | ||
> | ||
> - Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.). | ||
> - Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs. | ||
> - Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output. | ||
> - Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else). | ||
For this guide we will be downloading the essays from Paul Graham and use them as our data source. We will then start querying these essays with LlamaIndex. | ||
|
||
## Creating an account on Comet.com | ||
|
||
[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key. | ||
|
||
> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information. | ||
|
||
```python | ||
import os | ||
import getpass | ||
|
||
os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ") | ||
os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ") | ||
``` | ||
|
||
If you are running the Opik platform locally, simply set: | ||
|
||
|
||
```python | ||
# import os | ||
# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api" | ||
``` | ||
|
||
## Preparing our environment | ||
|
||
First, we will install the necessary libraries, download the Chinook database and set up our different API keys. | ||
|
||
|
||
```python | ||
%pip install opik llama-index llama-index-agent-openai llama-index-llms-openai --quiet | ||
``` | ||
|
||
And configure the required environment variables: | ||
|
||
|
||
```python | ||
import os | ||
import getpass | ||
|
||
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") | ||
``` | ||
|
||
In addition, we will download the Paul Graham essays: | ||
|
||
|
||
```python | ||
import os | ||
import requests | ||
|
||
# Create directory if it doesn't exist | ||
os.makedirs('./data/paul_graham/', exist_ok=True) | ||
|
||
# Download the file using requests | ||
url = 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' | ||
response = requests.get(url) | ||
with open('./data/paul_graham/paul_graham_essay.txt', 'wb') as f: | ||
f.write(response.content) | ||
``` | ||
|
||
## Using LlamaIndex | ||
|
||
### Configuring the Opik integration | ||
|
||
You can use the Opik callback directly by calling: | ||
|
||
|
||
```python | ||
from llama_index.core import Settings | ||
from llama_index.core.callbacks import CallbackManager | ||
from opik.integrations.llama_index import LlamaIndexCallbackHandler | ||
|
||
opik_callback_handler = LlamaIndexCallbackHandler() | ||
Settings.callback_manager = CallbackManager([opik_callback_handler]) | ||
``` | ||
|
||
Now that the callback handler is configured, all traces will automatically be logged to Opik. | ||
|
||
### Using LLamaIndex | ||
|
||
The first step is to load the data into LlamaIndex. We will use the `SimpleDirectoryReader` to load the data from the `data/paul_graham` directory. We will also create the vector store to index all the loaded documents. | ||
|
||
|
||
```python | ||
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader | ||
|
||
documents = SimpleDirectoryReader("./data/paul_graham").load_data() | ||
index = VectorStoreIndex.from_documents(documents) | ||
query_engine = index.as_query_engine() | ||
``` | ||
|
||
We can now query the index using the `query_engine` object: | ||
|
||
|
||
```python | ||
response = query_engine.query("What did the author do growing up?") | ||
print(response) | ||
``` | ||
|
||
The author worked on writing short stories and programming, starting with early attempts on an IBM 1401 in 9th grade, using an early version of Fortran. Later, the author transitioned to working with microcomputers, building a TRS-80 and writing simple games and programs. Despite enjoying programming, the author initially planned to study philosophy in college but eventually switched to AI due to a lack of interest in philosophy courses. | ||
|
||
|
||
You can now go to the Opik app to see the trace: | ||
|
||
![LlamaIndex trace in Opik](/img/cookbook/llamaIndex_cookbook.png) |
Oops, something went wrong.