- Uses Hugging Face TRL for PPO
- Uses Hugging Face Peft for LoRA.
- Uses Bitsandbytes internally for 4bits and 8bits reference model modes.
- Uses our QLora standalone lib for QLora.
Launch jobs with
./job_sets/launch_sets.py <job_set_name>
Check the status with:
./job_sets/check_status.py
Where the reinforcement learning is located.
./with_trl/launch.py <experiment_name>
./approach_sft/launch.py <experiment_name>