🂡 AceCoder

Authors: Huaye Zeng, Dongfu Jiang, HaoZhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen @ TIGER-Lab

🔥News

[2025/2/3] We release the AceCoder Paper, along with the 🤗 Models and Datasets on Hugging Face.

Overview

Abstract

We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset AceCode-89K, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones.
We trained two reward model AceCodeRM-7B and AceCodeRM-32B on the constructed preference pairs. Best-of-N sampling results on HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) show consistent improvement.
We perform RL training from three policy models: Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Base and Qwen2.5-Coder-7B-Instruct. Two types of reward can be used, i.e. the trained reward model RM-7B and the rule-based reward, i.e. binary pass rate over the test cases in dataset. Additionaly, we also experiment with RL from the base model like DeepSeek-R1. Results show that directly RL from the Base Qwen2.5-Coder model can get 25% improvement on HumanEval-plus and 6% on MBPP-plus within just 80 optimization steps.
To our knowledge, this is the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. We believe our \dataset{} will unlock the potential of RL training for code generation models and help the community to further push the boundaries of LLM's coding abilities.

📚Dataset

AceCode-89K: The first large-scale coding dataset with an average of 16 test cases per prompt, synthesized by GPT-4o-mini
AceCodePair-300K: Constructed preference pairs from AceCode-89K for training reward model.
AceCode-89K-hard: where you can create sample 25% of the hard examples via this script

🤗Model

AceCodeRM (Reward Model)

AceCodeRM-7B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-7B-Instruct
AceCodeRM-32B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-32B-Instruct

AceCoder (RL Model)

Initial Policy Model	Reward Type	Training dataset	Final RL Model
Qwen2.5-7B-Instruct	AceCodeRM-7B	AceCode-89K-hard (22k)	TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM
Qwen2.5-7B-Instruct	Rule	AceCode-89K-hard (22k)	TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule
Qwen2.5-Coder-7B-Instruct	AceCodeRM-7B	AceCode-89K-hard (22k)	TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM
Qwen2.5-Coder-7B-Instruct	Rule	AceCode-89K-hard (22k)	TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule
Qwen2.5-Coder-7B	AceCodeRM-7B	AceCode-89K-hard (22k)	TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM
Qwen2.5-Coder-7B	Rule	AceCode-89K-hard (22k)	TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule

📈 Performance

See our website or paper for detailed performance report.

🚀Quick Start

git submodule init
git submodule update

Training Reward Model

(TODO)

Training RL Model

See train/train_rl/README.md for detailed instructions.

Evaluation

(TODO)

Citation

If you find this work helpful, please consider citing:

@article{AceCoder,
    title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},
    author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},
    journal={ArXiv},
    year={2025},
    volume={abs/2207.01780}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets/images		assets/images
data/acecoder89k		data/acecoder89k
train		train
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🂡 AceCoder

🔥News

Overview

📚Dataset

🤗Model

AceCodeRM (Reward Model)

AceCoder (RL Model)

📈 Performance

🚀Quick Start

Training Reward Model

Training RL Model

Evaluation

Citation

About

Releases

Packages

Contributors 2

License

TIGER-AI-Lab/AceCoder

Folders and files

Latest commit

History

Repository files navigation

🂡 AceCoder

🔥News

Overview

📚Dataset

🤗Model

AceCodeRM (Reward Model)

AceCoder (RL Model)

📈 Performance

🚀Quick Start

Training Reward Model

Training RL Model

Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages