LlamaIndex is a popular data framework for developing context-augmented LLM apps. RAG API is a new LlamaIndex powered API that provides everything you need for retrieval augmented generation (RAG): indexing, embedding, retrieval, and generation.
There are a few reasons why you might consider LlamaIndex with RAG API:
- Easy to set up and use.
- Connects to a range of data sources: local files, Google Cloud Storage, Google Drive.
- Supports a number of different file types: Google docs, drawings, slides, HTML, JSON, markdown, PPTX, DOCX, PDF, and text files.
- Supports a number of customizations:
chunk_size
andchunk_overlap
for ingestion,similarity_top_k
andvector_distance_threshold
for retrieval.
Imagine you own the 2024 model of a fictitious vehicle called Cymbal Starlight. It has a user’s manual in PDF format (cymbal-starlight-2024.pdf) and you want to ask LLM questions about this vehicle from the manual.
LLMs are not trained with this user manual and they won’t be able to answer any questions about the vehicle but we’ll see how to use the RAG API to augment our LLM.
There's a main.py sample to show how to use the RAG API.
First, let's ask a question to the LLM about the vehicle without any RAG:
model = GenerativeModel(model_name="gemini-1.5-flash-001")
response = model.generate_content(
"What is the cargo capacity of Cymbal Starlight?",
generation_config=GenerationConfig(temperature=0.0)
)
Run it:
python main.py --project your-project-id generate_text \
--prompt "What is the cargo capacity of Cymbal Starlight?"
You get a response like this:
Prompt: What is the cargo capacity of Cymbal Starlight?
Response text: I do not have access to real-time information, including specific details
about ships like the "Cymbal Starlight."
Not surprisingly, the LLM does not know about the vehicle.
Before you can ingest a PDF, you need to create a RAG corpus, an index to import or upload documents:
corpus = rag.create_corpus(display_name=corpus_display_name)
Run it:
python main.py --project_id your-project-id create_corpus --display_name cymbal-starlight-corpus
Corpus created: projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064
Next, you need to either a file to the corpus directly or host it on Google Cloud Storage or Google Drive and point to it.
In this case, let's upload the fictitious cymbal-starlight-2024.pdf user manual file.
rag_file = rag.upload_file(
corpus_name=corpus_name,
path=path,
display_name=display_name,
description=description,
)
Run it:
python main.py --project_id your-project-id upload_file \
--corpus_name projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064 \
--path cymbal-starlight-2024.pdf --display_name cymbal-starlight-2024.pdf
File upload to corpus: projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064
-name: projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064/ragFiles/8935141660703064064
display_name: cymbal-starlight-2024.pdf
At this point, you can do a direct retrieve from the corpus with top k relevant docs/chunks:
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=corpus_name
# Supply IDs from `rag.list_files()`.
# rag_file_ids=["rag-file-1", "rag-file-2", ...],
)
],
text="What is the cargo capacity of Cymbal Starlight?",
similarity_top_k=10, # Optional
vector_distance_threshold=0.5, # Optional
)
Run it:
python main.py --project_id your-project-id direct_retrieve \
--corpus_name projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064 \
--text "What is the cargo capacity of Cymbal Starlight?"
And you should get back a list of chunks:
Text: What is the cargo capacity of Cymbal Starlight?
Response: contexts {
contexts {
source_uri: "cymbal-starlight-2024.pdf"
text: "This light may illuminate for a variety of reasons, ..."
distance: 0.37499325722150889
},
contexts {
...
}
Finally, we're ready to ground the LLM with the RAG corpus and ask questions about the vehicle.
model = GenerativeModel(model_name="gemini-1.5-flash-001")
tools = [Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[
rag.RagResource(
rag_corpus=corpus_name, # Currently only 1 corpus is allowed.
# Supply IDs from `rag.list_files()`.
# rag_file_ids=["rag-file-1", "rag-file-2", ...],
)
],
similarity_top_k=3, # Optional
vector_distance_threshold=0.5, # Optional
),
)
)]
response = model.generate_content(
"What is the cargo capacity of Cymbal Starlight?",
tools=tools,
generation_config=GenerationConfig(
temperature=0.0,
),
)
Run it:
python main.py --project genai-atamel generate_text \
--corpus_name projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064 \
--prompt "What is the cargo capacity of Cymbal Starlight?"
You get a response like this:
Corpus name: projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064
Prompt: What is the cargo capacity of Cymbal Starlight?
Response text: The Cymbal Starlight has a cargo capacity of 13.5 cubic feet. The cargo area is located in the trunk of the vehicle.
Let's ask another question:
python main.py --project genai-atamel generate_text \
--corpus_name projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064 \
--prompt "What's the emergency roadside assistance phone number?"
Response:
Corpus name: projects/207195257545/locations/us-central1/ragCorpora/8935141660703064064
Prompt: What's the emergency roadside assistance phone number?
Response text: The emergency roadside assistance phone number is 1-800-555-1212.
Yay, it works!