Skip to content

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

License

Notifications You must be signed in to change notification settings

ranguy9304/LangGraphRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LangGraphRAG

LangGraphRAG is a terminal-based Retrieval-Augmented Generation (RAG) system implemented using LangGraph. The architecture is designed to handle queries by routing them through a series of processes involving message history caching, query transformation, and document retrieval from a vector database.

Project Structure

The project is divided into several modules, each responsible for specific functionalities:

  1. Architecture: Defines the flow of the RAG system.
  2. Data: Contains data files and models.
  3. Modules: Houses the core logic and functions.

Setup Instructions

Follow these steps to set up and run the project:

  1. Clone the repository:

    git clone https://github.com/ranguy9304/LangGraphRAG.git
    cd LangGraphRAG
  2. Create a virtual environment:

    python3.12 -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the requirements:

    pip install -r requirements.txt
    choco install wkhtmltopdf
  4. Configure the environment variables:

    • Copy the example environment file:
      cp .env.example .env
    • Modify the .env file to add your GPT key:
      OPENAI_API_KEY=your_gpt_key_here
    • Add your webpage urls if using webpages (keep it comma seperated without quotations):
      URLS =url1,url2
    • set the GET_WEB_PAGES_TO_PDF to True if downloading webpages else False:
      GET_WEB_PAGES_TO_PDF=False
      
    • set the CONVERT_PDF_TO_MD to True if already have pdf else False:
      CONVERT_PDF_TO_MD=True
      
    • If using PDF docs directly store them in the path used in INTERMEDIATE_PDF_DIR
    • If using .md docs directly store them in the path used in DATA_DIR
  5. Setup documents: from the root directory run

    python modules/processDocs.py

    this sets up the webpages and docs. Dont forget to modify the document processing parameters in .env as per your needs.

  6. Run the main program:

    python main.py

Usage

  • The system handles queries by routing them through different processes.
  • It uses LangGraph to manage the flow and interactions between modules.

Diagrams

Vector DB Creation

Vector DB Creation

RAG Architecture

RAG Architecture

Contribution

Feel free to fork the repository and submit pull requests. For major changes, please open an issue to discuss what you would like to change.

License

MIT

About

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages