Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
opencompass		opencompass
tools		tools
.gitignore		.gitignore
README.md		README.md
local_task.yaml		local_task.yaml
run.py		run.py
run.sh		run.sh
run_local_test.py		run_local_test.py
run_local_test.sh		run_local_test.sh
setup.py		setup.py

README.md

Description

In order to facilitate everyone to reproduce our experimental results, we will release the evaluation code. We used mainstream open source evaluation tasks（MMLU, CMMLU, CEVAL...） to measure the performance of our model. At the same time, we adopted OpenCompass as the main framework of the evaluation, and made adaptive modifications on this basis.

Quick Start

Environment Setup

conda create --name benchmark_env python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate benchmark_env
git clone llm_benchmark_repo_url llm_benchmark
cd llm_benchmark
pip install -e .

Data Download

# Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip

Evaluation

Run with python command:

python run.py --datasets mmlu_ppl ceval_ppl cmmlu_ppl ARC_c_ppl ARC_e_ppl hellaswag_ppl gsm8k_gen humaneval_gen \
    --hf-path ./models/model_name \ 
    --model-kwargs device_map='auto' trust_remote_code=True \
    --tokenizer-kwargs padding_side='left' truncation='left' use_fast=False trust_remote_code=True \
    --max-out-len 1 \
    --max-seq-len 4096 \
    --batch-size 64 \
    --no-batch-padding \
    --num-gpus 1

Run with config file:

Define the task_file in run_local_test.py, then run the following command:
```
./run_local_test.sh
```

Get dataset config file

Use following python command to get dataset config

# dataset name like mmlu or arc
python ./tools/list_configs.py mmlu arc

Acknowledgements

Thanks to the release of the following projects, which have provided great help in quickly building a comparable benchamrk：

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluate

evaluate

README.md

Description

Quick Start

Acknowledgements

Files

evaluate

Directory actions

More options

Directory actions

More options

Latest commit

History

evaluate

Folders and files

parent directory

README.md

Description

Quick Start

Acknowledgements