Automatic Interactive Evaluation

This repo is for Automatic Interactive Evaluation (AIE) for assessing the medical consultation capacity of the LLMs.

DataSet

The datasets are included in the repo ./data/datasets

Patient Simulator Test: 50 cases are sampled from the hospital real cases used to test the patient simulator.
HispitalCases: 50 cases are sampled from the hospital real cases used to test the doctor LLMs.
MedicalExam: 150 cases are sampled from five public available medical examination dataset used to test the doctor LLM.

Results

The experiments results are included in the repo ./data/results

raw_data: including the consultation conversation between the doctor LLMs and patient simulator generated during the AIE framework.
scores: the metrics calculated on the corresponding consultation conversation, including:
- auto: automatic metrics evaluation scores
- gpt4: GPT-4 evaluation scores
- human: Human evaluation scores

Quick Start

Environment Preparation

conda create -n {env_name} python=3.8.10
pip install -r requirement.txt
cd src

Run AIE

Patient Simulator Testing (Option)

If your want to validate your personalize patient simulator, your can run the script below to ensure its behavior in the AIE.

data_root=../data/datasets
data_name=Patient_Simulator_Test

output_root=../results/your_results
patient_prompt=base_v9_zh

patient_model=gpt4 # evaluated patient simulator
CUDA_VISIBLE_DEVICES=0 python patient_test.py \
    --mode ninth \
    --input-file-name ${data_root}/${data_name}.json \
    --output-file-name ${output_root}/${data_name}/${patient_model}.json \
    --patient-prompt-id ${patient_prompt} \
    --patient-model ${patient_model} \
    --max-turn 10 \
    --workers 1

Doctor LLM Test

data_root=../data/datasets
data_name=HospitalCases

output_root=../results/your_results
patient_prompt=base_v9_zh
doctor_prompt=base_v3_zh
patient_model=gpt4
state_model=gpt4

cuda_id=0

doctor_model=gpt4 # evaluated doctor llm
CUDA_VISIBLE_DEVICES=$cuda_id python consultation.py \
    \
    `# base configs` \
    --mode ninth\
    --input-file-name ${data_root}/${data_name}.json \
    --output-file-name $output_root/${data_name}/${doctor_model}.json \
    \
    `# patient configs` \
    --patient-prompt-id ${patient_prompt} \
    --patient-model ${patient_model} \
    \
    `# doctor configs` \
    --doctor-prompt-id ${doctor_prompt} \
    --doctor-model ${doctor_model} \
    \
    `# conversation configs` \
    --state-model ${state_model} \
    --diagnosis-model ${doctor_model} \
    --max-turn 10  \
    --workers 10

Evaluation

Automatic Metrics Evaluation

Patient Evaluation

data_name=Patient_Simulator_Test
output_root=../results/your_results
python ./eval/patient_eval.py --folder-path ${output_root}/${data_name}

Doctor Evaluation

data_name=HospitalCases
output_root=../results/your_results
python ./eval/doctor_eval.py --folder-path $output_root/${data_name}$

Cite

@article{liao2024automatic,
  title={Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator},
  author={Liao, Yusheng and Meng, Yutong and Wang, Yuhao and Liu, Hongcheng and Wang, Yanfeng and Wang, Yu},
  journal={arXiv preprint arXiv:2403.08495},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
imgs		imgs
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Interactive Evaluation

DataSet

Results

Quick Start

Environment Preparation

Run AIE

Patient Simulator Testing (Option)

Doctor LLM Test

Evaluation

Automatic Metrics Evaluation

Cite

About

Releases

Packages

Languages

BlueZeros/Automatic_Interactive_Evaluation

Folders and files

Latest commit

History

Repository files navigation

Automatic Interactive Evaluation

DataSet

Results

Quick Start

Environment Preparation

Run AIE

Patient Simulator Testing (Option)

Doctor LLM Test

Evaluation

Automatic Metrics Evaluation

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages