CS295 Reinforcement Learning Mini Project

Sample Efficient Microsatellite Attitude Control using Deep Reinforcement Learning with Unity and OpenAI Gym

Simulation Results

The following are simulations with a trained agent using the listed methods below. Training session lasted for 17K episodes (5.1M timesteps) in about 2 full days.

Soft-Actor Critic V1 SACv1)

Twin-Delayed DDPG (TD3)

Soft-Actor Critic V2 (SACv2)

TD3 with Prioritized Experience Replay (TD3-PER)

Setting up dependencies

Create a virtualenvironment (or use Pytorch Docker)

 virtualenv venv -p python3.6
 source venv/bin/activate

Install dependencies python setup.py develop
Create directories needed for training
- tmp: directory of trained models
- unity_environments: directory of unity executable environment
- wandb: for wandb logs when training
```
  mkdir tmp
  mkdir unity_environments
  mkdir wandb
```
Download unity executable from source and extract it on folder unity_environments
Change folder permission containing the unity executable
Depending on the selected DRL algorithm (e.g. TD3, SAC, SACv2, TD3-PER, etc.), change the hyperparameters and environment config on the YAML file located inside config/train
Train the model
```
   python bin/train/train_<DRL_ALGO>.py
```
DRL algorithms are composed of the following (included only working ones):
- sac: Soft-Actor Critic V1
- sacv2: Soft-Actor Critic V2
- td3: Twin Delayed DDPG
- td3_per: TD3 with Prioritized Experience Replay
Once the model is trained, change the test config on the YAML config inside config/test.
Test the simulation using the command below. It is better if the simulation is a graphical version of the previous Unity Executable
```
   python bin/test/test_<DRL_ALGO>.py
```
(Optional) You can change the number of episodes inside bin/test/test_<DRL_ALGO>.py

Training Results

Results of the training can be found in this wandb repository
https://wandb.ai/jamesandrewsarmiento/microsat_17K/overview?workspace=user-jamesandrewsarmiento

Paper

The written research paper will be available soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CS295 Reinforcement Learning Mini Project

Sample Efficient Microsatellite Attitude Control using Deep Reinforcement Learning with Unity and OpenAI Gym

Simulation Results

Soft-Actor Critic V1 SACv1)

Twin-Delayed DDPG (TD3)

Soft-Actor Critic V2 (SACv2)

TD3 with Prioritized Experience Replay (TD3-PER)

Setting up dependencies

Training Results

Paper

Files

README.md

Latest commit

History

README.md

File metadata and controls

CS295 Reinforcement Learning Mini Project

Sample Efficient Microsatellite Attitude Control using Deep Reinforcement Learning with Unity and OpenAI Gym

Simulation Results

Soft-Actor Critic V1 SACv1)

Twin-Delayed DDPG (TD3)

Soft-Actor Critic V2 (SACv2)

TD3 with Prioritized Experience Replay (TD3-PER)

Setting up dependencies

Training Results

Paper