Self-Driving Racecar with Proximal Policy Optimization

Solving the OpenAI Gym CarRacing-v0 environment using Proximal Policy Optimization.

Demo

See the full video demo on YouTube.

After 5000 training steps, the agent achieves a mean score of 909.48±10.30 over 100 episodes. To reproduce the results, run the following commands:

mkdir logs
python demo.py --ckpt extra/final_weights.pt --delay_ms 0

Results from episodes will be saved to logs/episode_rewards.csv.

A convolutional neural network to jointly approximate the value function and the policy.
Optimization is performed using Proximal Policy Optimization.
Policy network outputs parameters to a Beta distribution, which is better for bounded continuous action spaces.
Advantage estimation is done through the Generalized Advantage Estimation algorithm.
A series of 4 frames are concatenated to form the input to the network, with frame skipping optionally applied.