Athena-mind-opensource

Athena-mind-opensource is a chatbot framework designed to save you from building a chatbot from scratch.

System Diagram

Key Features

Versatile Adapter Support for Multiple Use Cases
- Ready-to-use RAG template to quickly set up a pre-defined RAG with your documents
- Fully customizable using Langchain for more complex processes
Intelligent Routing Between Adapters
- Seamless routing powered by LLM using only adapter names and roles
Production-Ready LLM & Vector Model Deployment
- Ready for production deployment along with message queues for batch processing using Kafka
User-Friendly Interface with Chainlit
- Communicates with REST API and provides a simple UI for chatting
Efficient Data Preparation
- Prepares data from various formats for seamless processing and use by the RAG adapter created by the template

Prerequisites

Hardware Requirements

LLM	CPU	Memory	GPU	GPU Memory
Gemini (Cloud Service)	4	16GB	-	-
Hugging Face (Self-Hosted)	8	32GB	1	16 GB

Software Requirements

Ensure you have installed the following prerequisites on your development machine

Docker
Docker Compose 2.20.3+
MongoDB Compass (optional)
Gemini API Key
Hugging Face Access Token

Getting Started

The provided datasource refers to information about KBank’s bank account from the following links:

Here are the instructions to run with the sample data source provided in the data folder.

Set up your .env file with your specific values.

# Kafka config
# KAFKA_EXPOSE_PORT MUST NOT be set to `9092`
KAFKA_EXPOSE_PORT=9093

# Mongo config
MONGO_INITDB_ROOT_USERNAME= #REQUIRED
MONGO_INITDB_ROOT_PASSWORD= #REQUIRED
MONGO_EXPOSE_PORT=27017

# Opensearch config
OPENSEARCH_DASHBOARD_EXPOSE_PORT=5601

# NLU
NLU_SERVICE_EXPOSE_PORT=8901

# Adapter Service
ADAPTER_SERVICE_EXPOSE_PORT=8900

# Gemini
GEMINI_API_KEY= #REQUIRED

##########################
# OpenTelemetry
# To enable telemetry, you need to modify the `docker-compose.yml` file to configure the tracing UI first.
# If you are using an example, here is a sample configuration.

# ENABLE_TELEMETRY=True
# TELEMETRY_COLLECTOR_ENDPOINT="http://jaeger:14268/api/traces?format=jaeger.thrift"
##########################
ENABLE_TELEMETRY=False
TELEMETRY_COLLECTOR_ENDPOINT=

[!IMPORTANT] Ensure that KAFKA_EXPOSE_PORT is not configured as 9092

Running service
```
docker compose up --build -d
```
Wait until data-prep is ready, then you can access the UI chatbot by opening http://localhost:8901
Sample command for using Docker Compose.

Check service health
```
docker compose ps
```

View service logs

# specific service
docker compose logs {{service name}}
# all service
docker compose logs -f

[!NOTE] {{service name}} obtained from the service name in docker-compose.yml.

Stop all services
```
docker compose down
```

Project structure

data : Data source for RAG
deployments : Infra service deployments
scripts : Helper scripts that include functionalities such as:
- Data preparation
- Create kafka topic
services : All services provided by athena-mind-opensource.

Customized data source

Prepare your data in data folder with the following details:

data/config.json

[
    {
        "adapter": "web_account",
        "name": "Account Expert",
        "role": "Answer the question about four accounts: FCD (Foreign Currency Deposit), K-eSaving and other types of accounts",
        "type": "rag",
        "host": "service-adapter:8900",
        "config": {
            "dataset_path": "data/web_account/data/all_data.jsonl",
            "dataset_doc_id": "doc_id",
            "mongo_db_name": "athena-web-account",
            "mongo_collection_name": "testing",
            "opensearch_index": "athena-web-account",
            "prompt": {
                "dir": "data/web_account/prompt",
                "query_generate_prompt": "query_generation_prompt.txt",
                "qa_prompt": "qa_prompt.txt",
                "response_control_prompt": "rc_prompt.txt"
            }
        }
    },
    {
        "adapter": "general_handler",
        "name": "General Handler",
        "role": "Answer any questions about others",
        "type": "custom",
        "host": "service-adapter:8900"
    }
]

Fields Description:

adapter - Must be the name of folder inside data folder.
name - Name of adapter
role - Short description about role of chatbot
type
- custom
- rag
host - Adapter endpoint
config
- dataset_path - Path to refer your dataset
- dataset_doc_id - Document key name
- mongo_db_name - Name of db in mongo
- mongo_collection_name - Name of collection in mongo
- opensearch_index - Name of index in opensearch
- prompt
  - dir - Directory that store prompt
  - query_generate_prompt - File name of prompt for query generation
  - qa_prompt - File name of prompt for question and answer prompt
  - response_control_prompt - File name of prompt for response control

data/{{adapter}}/data/{{name}}.jsonl

[!CAUTION] {{adapter}} - MUST MATCH with field adapter in data/config.json

{
  "doc_id": "method_with_kplus",
  "url": [
    "https://www.kasikornbank.com/th/personal/Digital-banking/Pages/k-esavings-account-opening-have-kplus.aspx"
  ],
  "title": [
    "วิธีเปิดบัญชีออนไลน์ K-eSavings"
  ],
  "description": [
    "เปิดปุ๊บ ใช้งานได้ปั๊บ ภายใน 5 ขั้นตอน",
    "1. เข้าสู่ระบบ และเลือก \"บริการอื่น\" เพื่อเปิดบัญชีเงินฝาก",
    "2. เลือก \"เปิดบัญชีเงินฝาก\" ในบริการอื่น และเลือก \"เปิดบัญชีใหม่\"",
    "3. อ่านข้อตกลงและเงื่อนไข กรอกข้อมูลให้ครบถ้วน และเลือก \"ถัดไป\"",
    "4. ตรวจสอบข้อมูลอีกครั้งหลังจากกรอกข้อมูล หากถูกต้องแล้วเลือก \"ยืนยัน\"",
    "5. ลูกค้าจะได้รับรายละเอียดบัญชีผ่านทาง Feed SMS และอีเมลที่ได้ลงทะเบียนไว้กับธนาคาร"
  ]
}

JSON Object can be customized, but it should include the following details:

Document Key [string] - The field name MUST MATCH the value of the key config.dataset_doc_id from data/config.json (i.e., doc_id)
Data Fields [list of string] - Can have more than one fields (i.e., url, title, description)

Re-create service
```
docker compose up --build -d
```

Development

If you want to customize the service for a specific case, please refer to the instruction below.

References

Chainlit is an open-source Python package to build production ready Conversational AI.
LangGraph is a library within the LangChain ecosystem that provides a framework for defining, coordinating, and executing multiple LLM agents (or chains) in a structured and efficient manner.
LangChain is a framework for developing applications powered by large language models (LLMs).
LangServe is a Python framework that helps developers deploy LangChain runnable and chains as REST APIs

License

The project is licensed under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Athena-mind-opensource

System Diagram

Key Features

Prerequisites

Hardware Requirements

Software Requirements

Getting Started

Project structure

Customized data source

Development

References

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
data		data
deployments		deployments
scripts		scripts
services		services
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

woraphonedge/athena-mind-opensource

Folders and files

Latest commit

History

Repository files navigation

Athena-mind-opensource

System Diagram

Key Features

Prerequisites

Hardware Requirements

Software Requirements

Getting Started

Project structure

Customized data source

Development

References

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages