Skip to content

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"

License

Notifications You must be signed in to change notification settings

TIGER-AI-Lab/AceCoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🂡 AceCoder


Authors: Huaye Zeng, Dongfu Jiang, HaoZhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen  @ TIGER-Lab  

🔥News

Overview

./assets/images/ac_overview.png

Abstract
  • We introduce AceCoder, the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. To do this, we curated the dataset AceCode-89K, where we start from a seed code dataset and prompt powerful LLMs to "imagine" proper test cases for the coding question and filter the noisy ones.

  • We trained two reward model AceCodeRM-7B and AceCodeRM-32B on the constructed preference pairs. Best-of-N sampling results on HumanEval(+), MBPP(+), BigCodeBench, LiveCodeBench (V4) show consistent improvement.

  • We perform RL training from three policy models: Qwen2.5-7B-Instruct and Qwen2.5-Coder-7B-Base and Qwen2.5-Coder-7B-Instruct. Two types of reward can be used, i.e. the trained reward model RM-7B and the rule-based reward, i.e. binary pass rate over the test cases in dataset. Additionaly, we also experiment with RL from the base model like DeepSeek-R1. Results show that directly RL from the Base Qwen2.5-Coder model can get 25% improvement on HumanEval-plus and 6% on MBPP-plus within just 80 optimization steps.

  • To our knowledge, this is the first work to propose a fully automated pipeline for synthesizing large-scale reliable tests used for the reward model training and reinforcement learning in the coding scenario. We believe our \dataset{} will unlock the potential of RL training for code generation models and help the community to further push the boundaries of LLM's coding abilities.

📚Dataset

  • AceCode-89K: The first large-scale coding dataset with an average of 16 test cases per prompt, synthesized by GPT-4o-mini
  • AceCodePair-300K: Constructed preference pairs from AceCode-89K for training reward model.
  • AceCode-89K-hard: where you can create sample 25% of the hard examples via this script

🤗Model

AceCodeRM (Reward Model)

  • AceCodeRM-7B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-7B-Instruct
  • AceCodeRM-32B: A reward model trained on AceCodePair-300K from Qwen2.5-Coder-32B-Instruct

AceCoder (RL Model)

Initial Policy Model Reward Type Training dataset Final RL Model
Qwen2.5-7B-Instruct AceCodeRM-7B AceCode-89K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-RM
Qwen2.5-7B-Instruct Rule AceCode-89K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule
Qwen2.5-Coder-7B-Instruct AceCodeRM-7B AceCode-89K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-RM
Qwen2.5-Coder-7B-Instruct Rule AceCode-89K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Ins-Rule
Qwen2.5-Coder-7B AceCodeRM-7B AceCode-89K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-RM
Qwen2.5-Coder-7B Rule AceCode-89K-hard (22k) TIGER-Lab/AceCoder-Qwen2.5-Coder-7B-Base-Rule

📈 Performance

See our website or paper for detailed performance report.

🚀Quick Start

git submodule init
git submodule update

Training Reward Model

(TODO)

Training RL Model

See train/train_rl/README.md for detailed instructions.

Evaluation

(TODO)

Citation

If you find this work helpful, please consider citing:

@article{AceCoder,
    title={AceCoder: Acing Coder RL via Automated Test-Case Synthesis},
    author={Zeng, Huaye and Jiang, Dongfu and Wang, Haozhe and Nie, Ping and Chen, Xiaotong and Chen, Wenhu},
    journal={ArXiv},
    year={2025},
    volume={abs/2207.01780}
}

About

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published