This Streamlit application implements a Langchain-based retrieval system for processing PDF documents and conducting conversational retrieval using Langchain's capabilities.
- Read More: here
The application allows users to upload PDF files, extract text, split it into chunks, generate embeddings using Google Palm embeddings, and create a conversational retrieval chain. Users can then ask questions related to the processed PDF content and receive responses based on the conversational chain set up.
- Langchain: A library for natural language processing tasks, including text splitting and conversational retrieval.
- Google Palm Embeddings: Embeddings used for semantic similarity and text representation.
- FAISS (Facebook AI Similarity Search): An efficient library for similarity search and clustering of dense vectors.
-
Python Environment: Make sure you have Python 3.x installed.
-
Environment Variables: Create a
.env
file in the project root directory with the following content: GOOGLE_API_KEY=your_google_api_key_here Replaceyour_google_api_key_here
with your actual Google API key.
- Clone the Repository: Clone this repository to your local machine:
git clone https://github.com/Varunv003/langchain-palm2-rag_application
- Set Up Virtual Environment: It's recommended to use a virtual environment to manage dependencies:
python -m venv venv
# On Windows: .\venv\Scripts\activate
# On macOS/Linux: source venv/bin/activate
- Install Dependencies: Install required Python packages using pip:
pip install -r requirements.txt
- Template Structure: To set up the initial folder structure of the project, run:
python template.py
# This command will create necessary directories and files based on your project needs.
- Running the Application To run the Streamlit application:
streamlit run app.py
# The application will start, and you can access it in your web browser at http://localhost:8501.
- app.py: Main Streamlit application code for uploading PDFs, processing them, and managing user interactions.
- helper.py: Contains helper functions for PDF text extraction, text chunking, FAISS vector store creation, and conversational chain setup.
- template.py: Script to initialize the folder structure and create necessary directories/files for the project. .env: Environment variable file for storing sensitive data like API keys.
- Upload PDF Files: Use the "Upload Your Data" sidebar to upload one or more PDF files.
- Process PDFs: Click "Submit and Process" to extract text, generate embeddings, and set up a conversational retrieval chain.
- Ask Questions: Enter questions related to the uploaded PDF content in the text input field.
- View Responses: Responses generated by the Langchain conversational model will be displayed in the main interface.
- Logging: Logging is implemented to capture key steps and timings during PDF text extraction, text chunking, vector store creation, and conversational chain setup. Logs are displayed in the console or terminal where the application is run.
- Enhance error handling and user feedback during file upload and processing.
- Improve scalability and performance optimizations for handling larger PDF documents.
- Integrate additional AI models or refine existing models for better conversational responses.