The NanoMatter Custom Retrieval-Augmented Generation (RAG) App is designed to enable intelligent query answering by leveraging document-based knowledge. It currently processes PDF files to generate contextually relevant responses. Future iterations aim to support a broader range of file types and include an AI Agent for advanced query handling.
- PDF Parsing: Extracts information from PDF documents and generates responses using Hugging Face Open Source Model.
- Contextual Query Handling: Provides accurate answers based on the contents of uploaded PDFs.
- Other common document formats
- AI Agent Integration:
- Enhanced conversational capabilities.
- Ability to synthesize responses across multiple documents and file types.
- Multi-Document Analysis: Simultaneous querying across multiple files.
- Search and Summarization: Advanced document search and concise summaries for quick insights.
- Backend: Hugging Face Open source for natural language understanding and generation.
- Input Format: PDF (current version), CSV, XLSX
- Output: Text-based responses tailored to user queries.
- Frontend: Stream-lit based UI
- File Upload Module: Handles PDF uploads and validates file format.
- Preprocessing: Extracts text from PDFs using OCR (if necessary) and prepares data for GPT-4.
- Query Engine:
- Matches user queries with relevant document content.
- Generates responses using GPT-4.
- Response Module: Returns precise and context-aware answers.
- File Format Conversion: Incorporate libraries for handling diverse file types.
- AI Agent Layer: A conversational AI module capable of cross-referencing data and learning from interactions.
- Python 3.8+
- Virtual Environment
- Required libraries (specified in
requirements.txt
):openai
PyPDF2
langchain
faiss-cpu
- Clone the repository:
git clone https://github.com/nanomatter/RAG-Nanomatter.git
- Navigate to the project directory:
cd rag-app
- Create and activate a virtual environment:
python3 -m venv env source env/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python app.py
- Launch the app and upload a PDF document.
- Enter your query in the input field.
- Receive a detailed response based on the document content.
- Implement PDF parsing and query response using GPT-4.
- Add compatibility for CSV, TXT, XLSX, DOCX, and other formats.
- Develop an intelligent AI Agent for:
- Advanced queries.
- Multi-document handling.
- Continuous learning.
- Implement robust search functionalities.
- Develop user-friendly dashboards for document management.
This project is licensed under the Apache - 2.0.