Ultimate Volleyball is a multi-agent reinforcement learning environment built on Unity ML-Agents.
See 'Ultimate Volleyball Micro-Machine Learning Course' for an updated step-by-step micro-course.
Version: Up-to-date with ML-Agents Release 19
- Install the Unity ML-Agents toolkit (Release 19+) by following the installation instructions.
- Download or clone this repo containing the
ultimate-volleyball
Unity project. - Open the
ultimate-volleyball
project in Unity (Unity Hub → Projects → Add → Select root folder for this repo). - Load the
VolleyballMain
scene (Project panel → Assets → Scenes →VolleyballMain.unity
). - Click the ▶ button at the top of the window. This will run the agent in inference mode using the provided baseline model.
- If you previously changed Behavior Type to
Heuristic Only
, ensure that the Behavior Type is set back toDefault
(see Heuristic Mode). - Activate the virtual environment containing your installation of
ml-agents
. - Make a copy of the provided training config file in a convenient working directory.
- Run from the command line
mlagents-learn <path to config file> --run-id=<some_id> --time-scale=1
- Replace
<path to config file>
with the actual path to the file in Step 3
- Replace
- When you see the message "Start training by pressing the Play button in the Unity Editor", click ▶ within the Unity GUI.
- From another terminal window, navigate to the same directory you ran Step 4 from, and run
tensorboard --logdir results
to observe the training process.
For more detailed instructions, check the ML-Agents getting started guide.
To enable self-play:
- Set either Purple or Blue Agent Team ID to 1.
- Include the self-play hyperparameter hierarchy in your trainer config file, or use the provided file in
config/Volleyball_SelfPlay.yaml
(ML-Agents Documentation) - Set your reward function in
ResolveEvent()
inVolleyballEnvController.cs
.
Goal: Get the ball to bounce in the opponent's side of the court while preventing the ball bouncing into your own court.
Action space:
4 discrete action branches:
- Forward motion (3 possible actions: forward, backward, no action)
- Rotation (3 possible actions: rotate left, rotate right, no action)
- Side motion (3 possible actions: left, right, no action)
- Jump (2 possible actions: jump, no action)
Observation space:
Total size: 11
- Agent Y-rotation (1)
- Normalised directional vector from agent to ball (3)
- Distance from agent to ball (1)
- Agent X, Y, Z velocity (3)
- Ball X, Y, Z relative velocity (3)
Reward function:
The project contains some examples of how the reward function can be defined. The base example gives a +1 reward each time the agent hits the ball over the net.
The following baselines are included:
Volleyball_Random.onnx
- Random agentVolleyball_SelfPlay.onnx
- Trained using PPO with Self-Play in 60M stepsVolleyball.onnx
- Trained using PPO in 60M steps (without Self-Play)