Skip to content

Hybrid Search of full text and vector queries that execute against a search index containing both searchable plain text content and generated embeddings

License

Notifications You must be signed in to change notification settings

extrawest/pinecone_hybrid_search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Langchain Pinecone Hybrid Search Showcase

Maintenance Maintaner Ask Me Anything ! License

PROJECT INFO

  • Langchain
  • Pinecone for Vector Database
  • HuggingFace all-MiniLM-L6-v2 for embeddings
  • BM25 with mmh3 hashing encoder

Features

  • Hybrid Search is the combination of full text and vector queries that execute against a search index containing both searchable plain text content and generated embeddings

Demo

Input sentences: ['In 2019, I visited Hungary', 'In 2020, I visited Czech Republic', 'In 2021, I visited Georgia']
Custom query: What country did I visit first?
100%|██████████| 3/3 [00:00<00:00, 24.12it/s]
BM25 values saved to bm25_values.json
100%|██████████| 1/1 [00:02<00:00,  2.15s/it]
Query result: [Document(metadata={'score': 0.286206543}, page_content='In 2019, I visited Hungary'), Document(metadata={'score': 0.255560637}, page_content='In 2020, I visited Czech Republic'), Document(metadata={'score': 0.225382119}, page_content='In 2021, I visited Georgia')]

Generated bm25_values.json is present in the repo

Installing:

1. Clone this repo to your folder:

git clone THIS REPO

2. Create a virtual environment

3. Install the dependencies

pip install -r requirements.txt

Extrawest.com, 2024

About

Hybrid Search of full text and vector queries that execute against a search index containing both searchable plain text content and generated embeddings

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages