Skip to content

Latest commit

 

History

History
226 lines (194 loc) · 10.1 KB

README.md

File metadata and controls

226 lines (194 loc) · 10.1 KB

Athena-mind-opensource

Athena-mind-opensource is a chatbot framework designed to save you from building a chatbot from scratch.

System Diagram

AthenaMindSystemDiagram

Key Features

  • Versatile Adapter Support for Multiple Use Cases
    • Ready-to-use RAG template to quickly set up a pre-defined RAG with your documents
    • Fully customizable using Langchain for more complex processes
  • Intelligent Routing Between Adapters
    • Seamless routing powered by LLM using only adapter names and roles
  • Production-Ready LLM & Vector Model Deployment
    • Ready for production deployment along with message queues for batch processing using Kafka
  • User-Friendly Interface with Chainlit
    • Communicates with REST API and provides a simple UI for chatting
  • Efficient Data Preparation
    • Prepares data from various formats for seamless processing and use by the RAG adapter created by the template

Prerequisites

Hardware Requirements

LLM CPU Memory GPU GPU Memory
Gemini (Cloud Service) 4 16GB - -
Hugging Face (Self-Hosted) 8 32GB 1 16 GB

Software Requirements

Ensure you have installed the following prerequisites on your development machine

Getting Started

The provided datasource refers to information about KBank’s bank account from the following links:

Here are the instructions to run with the sample data source provided in the data folder.

  1. Set up your .env file with your specific values.

    # Kafka config
    # KAFKA_EXPOSE_PORT MUST NOT be set to `9092`
    KAFKA_EXPOSE_PORT=9093
    
    # Mongo config
    MONGO_INITDB_ROOT_USERNAME= #REQUIRED
    MONGO_INITDB_ROOT_PASSWORD= #REQUIRED
    MONGO_EXPOSE_PORT=27017
    
    # Opensearch config
    OPENSEARCH_DASHBOARD_EXPOSE_PORT=5601
    
    # NLU
    NLU_SERVICE_EXPOSE_PORT=8901
    
    # Adapter Service
    ADAPTER_SERVICE_EXPOSE_PORT=8900
    
    # Gemini
    GEMINI_API_KEY= #REQUIRED
    
    ##########################
    # OpenTelemetry
    # To enable telemetry, you need to modify the `docker-compose.yml` file to configure the tracing UI first.
    # If you are using an example, here is a sample configuration.
    
    # ENABLE_TELEMETRY=True
    # TELEMETRY_COLLECTOR_ENDPOINT="http://jaeger:14268/api/traces?format=jaeger.thrift"
    ##########################
    ENABLE_TELEMETRY=False
    TELEMETRY_COLLECTOR_ENDPOINT=

    [!IMPORTANT] Ensure that KAFKA_EXPOSE_PORT is not configured as 9092

  2. Running service

    docker compose up --build -d

    Wait until data-prep is ready, then you can access the UI chatbot by opening http://localhost:8901 AthenaMindChainlit

  3. Sample command for using Docker Compose.

  • Check service health

    docker compose ps
  • View service logs

    # specific service
    docker compose logs {{service name}}
    # all service
    docker compose logs -f

    [!NOTE] {{service name}} obtained from the service name in docker-compose.yml.

  • Stop all services

    docker compose down

Project structure

  • data : Data source for RAG
  • deployments : Infra service deployments
  • scripts : Helper scripts that include functionalities such as:
    • Data preparation
    • Create kafka topic
  • services : All services provided by athena-mind-opensource.

Customized data source

Prepare your data in data folder with the following details:

  1. data/config.json

    [
        {
            "adapter": "web_account",
            "name": "Account Expert",
            "role": "Answer the question about four accounts: FCD (Foreign Currency Deposit), K-eSaving and other types of accounts",
            "type": "rag",
            "host": "service-adapter:8900",
            "config": {
                "dataset_path": "data/web_account/data/all_data.jsonl",
                "dataset_doc_id": "doc_id",
                "mongo_db_name": "athena-web-account",
                "mongo_collection_name": "testing",
                "opensearch_index": "athena-web-account",
                "prompt": {
                    "dir": "data/web_account/prompt",
                    "query_generate_prompt": "query_generation_prompt.txt",
                    "qa_prompt": "qa_prompt.txt",
                    "response_control_prompt": "rc_prompt.txt"
                }
            }
        },
        {
            "adapter": "general_handler",
            "name": "General Handler",
            "role": "Answer any questions about others",
            "type": "custom",
            "host": "service-adapter:8900"
        }
    ]

    Fields Description:

    • adapter - Must be the name of folder inside data folder.
    • name - Name of adapter
    • role - Short description about role of chatbot
    • type
      • custom
      • rag
    • host - Adapter endpoint
    • config
      • dataset_path - Path to refer your dataset
      • dataset_doc_id - Document key name
      • mongo_db_name - Name of db in mongo
      • mongo_collection_name - Name of collection in mongo
      • opensearch_index - Name of index in opensearch
      • prompt
        • dir - Directory that store prompt
        • query_generate_prompt - File name of prompt for query generation
        • qa_prompt - File name of prompt for question and answer prompt
        • response_control_prompt - File name of prompt for response control
  2. data/{{adapter}}/data/{{name}}.jsonl

    [!CAUTION] {{adapter}} - MUST MATCH with field adapter in data/config.json

    {
      "doc_id": "method_with_kplus",
      "url": [
        "https://www.kasikornbank.com/th/personal/Digital-banking/Pages/k-esavings-account-opening-have-kplus.aspx"
      ],
      "title": [
        "วิธีเปิดบัญชีออนไลน์ K-eSavings"
      ],
      "description": [
        "เปิดปุ๊บ ใช้งานได้ปั๊บ ภายใน 5 ขั้นตอน",
        "1. เข้าสู่ระบบ และเลือก \"บริการอื่น\" เพื่อเปิดบัญชีเงินฝาก",
        "2. เลือก \"เปิดบัญชีเงินฝาก\" ในบริการอื่น และเลือก \"เปิดบัญชีใหม่\"",
        "3. อ่านข้อตกลงและเงื่อนไข กรอกข้อมูลให้ครบถ้วน และเลือก \"ถัดไป\"",
        "4. ตรวจสอบข้อมูลอีกครั้งหลังจากกรอกข้อมูล หากถูกต้องแล้วเลือก \"ยืนยัน\"",
        "5. ลูกค้าจะได้รับรายละเอียดบัญชีผ่านทาง Feed SMS และอีเมลที่ได้ลงทะเบียนไว้กับธนาคาร"
      ]
    }

    JSON Object can be customized, but it should include the following details:

    • Document Key [string] - The field name MUST MATCH the value of the key config.dataset_doc_id from data/config.json (i.e., doc_id)
    • Data Fields [list of string] - Can have more than one fields (i.e., url, title, description)
  3. Re-create service

    docker compose up --build -d

Development

If you want to customize the service for a specific case, please refer to the instruction below.

References

  • Chainlit is an open-source Python package to build production ready Conversational AI.
  • LangGraph is a library within the LangChain ecosystem that provides a framework for defining, coordinating, and executing multiple LLM agents (or chains) in a structured and efficient manner.
  • LangChain is a framework for developing applications powered by large language models (LLMs).
  • LangServe is a Python framework that helps developers deploy LangChain runnable and chains as REST APIs

License

The project is licensed under the Apache License 2.0.