Athena-mind-opensource is a chatbot framework designed to save you from building a chatbot from scratch.
- Versatile Adapter Support for Multiple Use Cases
- Ready-to-use RAG template to quickly set up a pre-defined RAG with your documents
- Fully customizable using Langchain for more complex processes
- Intelligent Routing Between Adapters
- Seamless routing powered by LLM using only adapter names and roles
- Production-Ready LLM & Vector Model Deployment
- Ready for production deployment along with message queues for batch processing using Kafka
- User-Friendly Interface with Chainlit
- Communicates with REST API and provides a simple UI for chatting
- Efficient Data Preparation
- Prepares data from various formats for seamless processing and use by the RAG adapter created by the template
LLM | CPU | Memory | GPU | GPU Memory |
---|---|---|---|---|
Gemini (Cloud Service) | 4 | 16GB | - | - |
Hugging Face (Self-Hosted) | 8 | 32GB | 1 | 16 GB |
Ensure you have installed the following prerequisites on your development machine
- Docker
- Docker Compose 2.20.3+
- MongoDB Compass (optional)
- Gemini API Key
- Hugging Face Access Token
The provided datasource refers to information about KBank’s bank account from the following links:
- บัญชีเงินฝากออมทรัพย์ K-eSavings
- เปิดบัญชีเงินฝากออมทรัพย์ K-eSavings ผ่าน K PLUS
- ขั้นตอนการเปิดบัญชีออนไลน์ K-eSavings สำหรับลูกค้าใหม่
- บัญชีเงินฝากออมทรัพย์
- บัญชีเงินฝากประจำ
- บัญชีเงินฝากกระแสรายวัน
- บัญชีเงินฝากเงินตราต่างประเทศ
Here are the instructions to run with the sample data source provided in the data
folder.
-
Set up your
.env
file with your specific values.# Kafka config # KAFKA_EXPOSE_PORT MUST NOT be set to `9092` KAFKA_EXPOSE_PORT=9093 # Mongo config MONGO_INITDB_ROOT_USERNAME= #REQUIRED MONGO_INITDB_ROOT_PASSWORD= #REQUIRED MONGO_EXPOSE_PORT=27017 # Opensearch config OPENSEARCH_DASHBOARD_EXPOSE_PORT=5601 # NLU NLU_SERVICE_EXPOSE_PORT=8901 # Adapter Service ADAPTER_SERVICE_EXPOSE_PORT=8900 # Gemini GEMINI_API_KEY= #REQUIRED ########################## # OpenTelemetry # To enable telemetry, you need to modify the `docker-compose.yml` file to configure the tracing UI first. # If you are using an example, here is a sample configuration. # ENABLE_TELEMETRY=True # TELEMETRY_COLLECTOR_ENDPOINT="http://jaeger:14268/api/traces?format=jaeger.thrift" ########################## ENABLE_TELEMETRY=False TELEMETRY_COLLECTOR_ENDPOINT=
[!IMPORTANT] Ensure that KAFKA_EXPOSE_PORT is not configured as 9092
-
Running service
docker compose up --build -d
Wait until data-prep is ready, then you can access the UI chatbot by opening http://localhost:8901
-
Sample command for using Docker Compose.
-
Check service health
docker compose ps
-
View service logs
# specific service docker compose logs {{service name}} # all service docker compose logs -f
[!NOTE] {{service name}} obtained from the service name in
docker-compose.yml
. -
Stop all services
docker compose down
data
: Data source for RAGdeployments
: Infra service deploymentsscripts
: Helper scripts that include functionalities such as:- Data preparation
- Create kafka topic
services
: All services provided by athena-mind-opensource.
Prepare your data in data
folder with the following details:
-
data/config.json
[ { "adapter": "web_account", "name": "Account Expert", "role": "Answer the question about four accounts: FCD (Foreign Currency Deposit), K-eSaving and other types of accounts", "type": "rag", "host": "service-adapter:8900", "config": { "dataset_path": "data/web_account/data/all_data.jsonl", "dataset_doc_id": "doc_id", "mongo_db_name": "athena-web-account", "mongo_collection_name": "testing", "opensearch_index": "athena-web-account", "prompt": { "dir": "data/web_account/prompt", "query_generate_prompt": "query_generation_prompt.txt", "qa_prompt": "qa_prompt.txt", "response_control_prompt": "rc_prompt.txt" } } }, { "adapter": "general_handler", "name": "General Handler", "role": "Answer any questions about others", "type": "custom", "host": "service-adapter:8900" } ]
Fields Description:
- adapter - Must be the name of folder inside
data
folder. - name - Name of adapter
- role - Short description about role of chatbot
- type
- custom
- rag
- host - Adapter endpoint
- config
- dataset_path - Path to refer your dataset
- dataset_doc_id - Document key name
- mongo_db_name - Name of db in mongo
- mongo_collection_name - Name of collection in mongo
- opensearch_index - Name of index in opensearch
- prompt
- dir - Directory that store prompt
- query_generate_prompt - File name of prompt for query generation
- qa_prompt - File name of prompt for question and answer prompt
- response_control_prompt - File name of prompt for response control
- adapter - Must be the name of folder inside
-
data/{{adapter}}/data/{{name}}.jsonl
[!CAUTION] {{adapter}} - MUST MATCH with field adapter in
data/config.json
{ "doc_id": "method_with_kplus", "url": [ "https://www.kasikornbank.com/th/personal/Digital-banking/Pages/k-esavings-account-opening-have-kplus.aspx" ], "title": [ "วิธีเปิดบัญชีออนไลน์ K-eSavings" ], "description": [ "เปิดปุ๊บ ใช้งานได้ปั๊บ ภายใน 5 ขั้นตอน", "1. เข้าสู่ระบบ และเลือก \"บริการอื่น\" เพื่อเปิดบัญชีเงินฝาก", "2. เลือก \"เปิดบัญชีเงินฝาก\" ในบริการอื่น และเลือก \"เปิดบัญชีใหม่\"", "3. อ่านข้อตกลงและเงื่อนไข กรอกข้อมูลให้ครบถ้วน และเลือก \"ถัดไป\"", "4. ตรวจสอบข้อมูลอีกครั้งหลังจากกรอกข้อมูล หากถูกต้องแล้วเลือก \"ยืนยัน\"", "5. ลูกค้าจะได้รับรายละเอียดบัญชีผ่านทาง Feed SMS และอีเมลที่ได้ลงทะเบียนไว้กับธนาคาร" ] }
JSON Object can be customized, but it should include the following details:
- Document Key [string] - The field name MUST MATCH the value of the key config.dataset_doc_id from
data/config.json
(i.e., doc_id) - Data Fields [list of string] - Can have more than one fields (i.e., url, title, description)
- Document Key [string] - The field name MUST MATCH the value of the key config.dataset_doc_id from
-
Re-create service
docker compose up --build -d
If you want to customize the service for a specific case, please refer to the instruction below.
- Create adapter using predefined RAG
- Create your customized adapter
- Change the Vector Model
- Change the LLM Model
- Chainlit is an open-source Python package to build production ready Conversational AI.
- LangGraph is a library within the LangChain ecosystem that provides a framework for defining, coordinating, and executing multiple LLM agents (or chains) in a structured and efficient manner.
- LangChain is a framework for developing applications powered by large language models (LLMs).
- LangServe is a Python framework that helps developers deploy LangChain runnable and chains as REST APIs
The project is licensed under the Apache License 2.0.