Following paper: Asynchronous Methods for Deep Reinforcement Learning (https://arxiv.org/pdf/1602.01783.pdf)
$ python cartpole_a3c.py --device=cpu --episodes=1000 --workers=4 --log_dir=cartpole_logs
The following graph shows the episode rewards (# workers: 4, entropy loss: 0.2)
Tensorboard:
$ tensorboard --logdir=cartpole_logs/
$ python acrobot_a3c.py --device=cpu --episodes=500 --workers=4 --log_dir=acrobot_logs
The following graph shows the episode rewards (# workers: 4, entropy loss: 0.2)
$ python mountaincar_a3c.py --device=cpu --episodes=20000 --workers=8 --log_dir=mc_logs
The following graph shows the episode rewards (# workers: 8, entropy loss: 1.0, tmax=5)
- Openai's A3C implementation (https://github.com/openai/universe-starter-agent)
- Arthur Juliani's blog post (https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2)