A PPO Reinforcement learning solution to the Unity-ML(Udacity) Crawler environment.
Crawler is a creature with 4 arms and 4 forearms, which needs to learn how to stand and walk forward without falling. The environment has 12 agents, and each one controls the target rotations for joints and heads of a cralwer through 20 continous actions.
The state consists of 129 float values representing position, rotation, velocity, and angular velocities of each limb.
To set up your python environment and run the code in this repository, follow the instructions below.
Create (and activate) a new environment with Python 3.6.
- Linux or Mac:
conda create --name ddpg-rl python=3.6
source activate ddpg-rl
- Windows:
conda create --name ddpg-rl python=3.6
activate ddpg-rl
Clone the repository and install dependencies
git clone https://github.com/kotsonis/ddpg-crawler.git
cd ddpg-crawler
pip install -r requirements.txt
pip install tensorflow==1.15
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Place the file in the
ddpg-crawler
folder, and unzip (or decompress) the file.
crawler.py reads the hyperparameters from the command line options to modify parameters and/or set saving options.You can get the CLI options by running
python train.py -h
A typical training invokation is provided below:
python crawler.py --train --trajectories 2001 --policy_optimization_epochs 160 \
--entropy_beta 0.002 --vf_coeff 0.05 --memory_batch_size 512 --actor_lr 8e-5 --gamma 0.95 \
--env [path to Crawler environment]
or, with default parameters:
pythom crawler.py --train
you can see the agent playing with the trained model as follows:
python crawler.py --play --notb
You can also specify the number of episodes you want the agent to play, as well as the non-default trained model as follows:
python crawler.py --play --notb --load ./model/model_saved.pt --episodes 20
You can read about the implementation details and the results obtained in Report.md