How to use

Getting started

Two example scripts are provided as scripts/train.py and scripts/play.py, which, as you can imagine, will run an RL pipeline, or load an existing policy and run it. To use, try out:

python scripts/train.py --task=mini_cheetah --headless

and once this has finished,

python scripts/play.py --task=mini_cheetah

You should be able to move the mini-cheetah robot around with the WASD keys, see the terminal output for instructions.

When training, the neural network model is saved in logs/<experiment_name>/<run_name>/, along with a snapshot of the entire repo. By default, the <experiment_name> is loaded from the task config file, and the <run_name> is generated with the date and time. When playing, the most recently created model matching the task and experiment is loaded.

The argument --headless simply runs train (or play) without a visualization; you can also train with visualization (which is a lot slower), and pause the visualizer by pressing v.

Command line arguments

There are several other arguments. For a complete list, see gym/utils/helpers; the most relevant ones are:

--task: Specifies which robot/config to load and run.
--headless: Force display off at all times.
--resume: Resume training (loads a saved policy).
--load_run: Name of the run to load when --resume=True. Defaults to the most recent.
--experiment_name: Name of the experiment to run or load. Defaults to the most recent.
--record: Record IsaacGym simulation at real-time speed. Note, esp. on ubuntu, use VLC (some media players don't handle it well).
--original_cfg: When loading a policy, use the original config file used to train it instead of the current one.
--checkpoint: Saved model checkpoint number. Defaults to the most recent.
--run_name: assign custom name to the run (for folder save-name and wandb tracking)
--num_envs: Override number of environments to create.
--seed: Set the random seed.
--max_iterations: Set the number of training iterations.
--wandb_project: Enter the name of your project for WandB tracking.
--wandb_entity: Enter your wandb entity username to track your experiment on your account.
--wandb_sweep_id: Enter a WandB sweep ID to continue an existing sweep.
--wandb_sweep_config: Enter the name of a JSON config for the WandB sweep.
--disable_wandb: Disable WandB logging for debugging.

Weights and Biases

Weights & Biases, or WandB, is integrated with both normal single run training and sweeping through the train.py and sweep.py scripts. By default, WandB is disabled and no logging of training progress is made while a network will be trained. When in this mode, networks and source code will only be saved locally to the logs directory.

Using WandB

To enable WandB, two pieces of information must be given to the script. The first is the WandB entity name, which is the username of the account to log to, and the second is a WandB project name, which is the name that all data will be saved under during training in the WandB console.

There are two ways to pass this information to the scripts. The first is by command line argument using the flags --wandb_entity=todo and --wandb_project=todo, replacing “todo” with the relevant information for your run. The second method is by JSON config file. An example config file, gpuGym/user/wandb_config_example.json is included in the directory but must be renamed to wandb_config.json for the scripts to read it. By default wandb_config.json is ignored by GIT via the .gitignore file to not pollute the repository.

The scripts will first look for a command line argument and then check if there is a JSON config file. The command line settings are the highest priority and will override the JSON config.

Using Sweeps

A sweep allows for training multiple runs with different hyperparameter settings automatically. Sweeps are controlled by WandB, so creating an account and enabling it for your setup is required.

There are two main facilitators of a sweep: the sweep controller and sweeping agents. A single controller is created for each sweep and selects the hyperparameters to test for each individual run. The sweep controller is created by the sweep.py script on the start of a sweep but exists in the cloud through WandB’s servers. The controller consumes a sweep config, that is specified in a sweep_config.json file, and follows the prescribed behavior to pass parameters to the sweeping agents. A sweeping agent is an individual run of training that receives its settings from the sweep controller. When the sweep.py script is run, a single sweep controller is made that returns a sweep_id for the sweep and then an agent is created as another process that consumes the sweep_id and receives its settings from the controller before performing the training run. Once completed an agent will fully shut down and the sweep.py script will create another agent to perform the next run (if there are still new runs to complete).

In addition to running agents sequentially on a single desktop, other computers connected to WandB can also control agents to complete the workload. Using the sweep_id that the sweep controller creates, any other computer connected to your WandB can use that ID to create more agents to run in parallel. Using the command line argument --wandb_sweep_id=todo, the sweep.py script will not create a new sweep controller and instead communicate with the first controller to request parameters for another agent to train. This can be done with multiple computers to parallelize sweeping across many systems. Note: multiple agents can be trained simultaneously on the same machine (VRAM permitting) but in general this doesn’t improve performance much over running sequentially as processing speed is the main limitation on a single machine.

If you would like to create and multiple sweep_config.json files, you can name them however you would like using the --wandb_sweep_config=todo command line argument to select which file to find the JSON object to define the sweep.

CLI Examples

Manually setting WandB project and entity:

    python gym/scripts/train.py --task=mini_cheetah --wandb_entity=myID --wandb_project=wandb_test_project

Using a wandb_config.json file:

    python gym/scripts/sweep.py --task=mini_cheetah --headless

Selecting a config file name:

    python gym/scripts/sweep.py --task=mini_cheetah --headless --wandb_sweep_config=sweep_config_example.json

Using the entity name in the wandb_config.json but overriding the project name:

    python gym/scripts/train.py --task=a1 --wandb_project=wandb_test_project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly