This repo contains python code for replicating the asynchronous advantage actor-critic algorithm as described in https://arxiv.org/pdf/1602.01783.pdf
- tensorflow
- scipy
- gym (Atari)
- skimage
For training a3c algorithm in BreakoutDeterministic-v3 using 8 parallel actor learner threads execute the following command:
python a3c.py --game BreakoutDeterministic-v3 --num_concurrent 8
For testing a trained a3c agent execute the folowing command
python a3c.py --game BreakoutDeterministic-v3 --checkpoint_path path_to_checkpoint --testing True
Below you can find 2 plots of training a3c in Breakout and Pong
Full explanation can be found here: https://papoudakis.github.io/announcements/policy_gradient_a3c/
https://github.com/miyosuda/async_deep_reinforce