Implemented Multi-GPU version of the A3C algorithm in Asynchronous Methods for Deep Reinforcement Learning.
Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym. You can see them in my gym page. Most of them are the best reproducible results on gym.
CUDA_VISIBLE_DEVICES=0 ./train-atari.py --env Breakout-v0
The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores. In each iteration it trains on a batch of 128 new states. The network architecture is larger than what's used in the original paper.
The pre-trained models are all trained with 4 GPUs for about 2 days. But on simple games like Breakout, you can get good performance within several hours. Also note that multi-GPU doesn't give you obvious speedup here, because the bottleneck in this implementation is not computation but data.
Some practicical notes:
- Occasionally, processes may not get terminated completely. It is suggested to use
systemd-run
to run any multiprocess Python program to get a cgroup dedicated for the task. - Training with a significant slower speed (e.g. on CPU) will result in very bad score, probably because of async issues.
- Download models from model zoo
ENV=Breakout-v0; ./run-atari.py --load "$ENV".tfmodel --env "$ENV" --episode 100 --output output_dir
Models are available for the following atari environments (click to watch videos of my agent):
- AirRaid (this one is flickering due to gym settings)
- Alien
- Amidar
- Assault
- Asterix
- Asteroids
- Atlantis
- BankHeist
- BattleZone
- BeamRider
- Berzerk
- Breakout
- Carnival
- Centipede
- ChopperCommand
- CrazyClimber
- DemonAttack
- DoubleDunk
- ElevatorAction
- FishingDerby
- Frostbite
- Gopher
- Gravitar
- IceHockey
- Jamesbond
- JourneyEscape
- Kangaroo
- Krull
- KungFuMaster
- MsPacman
- NameThisGame
- Phoenix
- Pong
- Pooyan
- Qbert
- Riverraid
- RoadRunner
- Robotank
- Seaquest
- SpaceInvaders
- StarGunner
- Tennis
- Tutankham
- UpNDown
- VideoPinball
- WizardOfWor
- Zaxxon
Note that atari game settings in gym are quite different from DeepMind papers, so the scores are not comparable. The most notable differences are:
- In gym, each action is randomly repeated 2~4 times.
- In gym, inputs are RGB instead of greyscale.
- In gym, an episode is limited to 10000 steps.
- The action space also seems to be different.
Also see the DQN implementation here