Skip to content

Compare OpenSearch, Pinecone, and ChromaDB for building RAG systems with AWS Bedrock embeddings. Analyze performance, relevance, and practical applications in semantic search.

Notifications You must be signed in to change notification settings

rahulmansharamani14/search-tools-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search Tools Analysis: A Comparative Study

This project explores and benchmarks OpenSearch, Pinecone, and ChromaDB as vector databases in a Retrieval-Augmented Generation (RAG) system. Using AWS Bedrock for embedding generation, the study evaluates performance metrics such as response time, relevance, and indexing efficiency to provide actionable insights for practical applications.

Project Features

  • Keyword-Based Search: Utilizes OpenSearch for traditional Lucene-based retrieval.
  • RAG-Based Search: Implements semantic search with vector databases:
    • OpenSearch (with k-NN plugin)
    • Pinecone
    • ChromaDB
  • Benchmarking Metrics:
    • Response time.
    • Precision@5 for relevance.
    • Indexing performance.

Dataset

The dataset contains 1,338 records with the following fields:

  • age: Age of the individual.
  • sex: Gender (male/female).
  • bmi: Body Mass Index (numeric).
  • children: Number of dependents.
  • smoker: Smoker status (yes/no).
  • region: Geographic region (e.g., southeast).
  • charges: Medical insurance charges.

Process

  1. Data Ingestion:

    • Preprocessed data is transformed into embeddings using AWS Bedrock's Titan Embedding Model.
    • Embeddings are indexed into OpenSearch, Pinecone, and ChromaDB.
  2. Inference:

    • Flask API exposes four endpoints for querying:
      • /search/free: Keyword-based search with OpenSearch.
      • /search/rag/opensearch: RAG-based search using OpenSearch.
      • /search/rag/pinecone: RAG-based search using Pinecone.
      • /search/rag/chroma: RAG-based search using ChromaDB.

Key Results

System Avg Indexing Time (100 docs) Avg Response Time (ms) Precision@5 Relevance
OpenSearch ~15,000 ms 252 0.8 4/5
Pinecone ~20,000 ms 335 0.9 4.5/5
ChromaDB ~8,000 ms 167 0.7 3.5/5

Technologies Used

  • Programming Language: Python
  • Frameworks: Flask, boto3
  • Cloud Services: AWS Bedrock, S3, OpenSearch
  • Vector Databases:
    • Pinecone (managed vector database)
    • ChromaDB (open-source, local)
  • Libraries:
    • opensearch-py
    • pinecone-client
    • chromadb

How to Run

  1. Clone the Repository:
git clone <repo_url>
cd <repo_name>

Install Dependencies:

pip install -r requirements.txt

Set Up Environment Variables:

  1. Configure AWS credentials, vector database keys, and OpenSearch settings in a .env file.

  2. Run the Flask Server:

python index.py

Test the Endpoints:

/search/free
/search/rag/opensearch
/search/rag/pinecone
/search/rag/chroma

About

Compare OpenSearch, Pinecone, and ChromaDB for building RAG systems with AWS Bedrock embeddings. Analyze performance, relevance, and practical applications in semantic search.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published