Skip to content

Commit

Permalink
drafts for doc pages
Browse files Browse the repository at this point in the history
  • Loading branch information
TheEimer committed May 30, 2024
1 parent 1976f4c commit f580646
Show file tree
Hide file tree
Showing 16 changed files with 189 additions and 75 deletions.
2 changes: 2 additions & 0 deletions docs/advanced_usage/algorithm_states.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Using the ARLBench States
==========================
2 changes: 2 additions & 0 deletions docs/advanced_usage/autorl_paradigms.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ARLBench and Different AutoRL Paradigms
=======================================
2 changes: 2 additions & 0 deletions docs/advanced_usage/dynamic_configuration.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Dynamic Configuration in ARLBench
==================================
4 changes: 2 additions & 2 deletions docs/advanced_usage/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Advanced Usage
==============
Advanced Configuration Options
==============================

.. toctree::
:hidden:
Expand Down
2 changes: 2 additions & 0 deletions docs/basic_usage/env_subsets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The ARLBench Subsets
====================
18 changes: 14 additions & 4 deletions docs/basic_usage/index.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@
Basic Usage
===========
Benchmarking AutoRL Methods
============================

.. toctree::
:hidden:
:maxdepth: 2
objectives
env_subsets
seeding


.. warning::
ARLBench provides an basis for benchmarking different AutoRL methods. This section of the documentation focuses on the prominent aspect of black-box hyperparameter optimization, since it's the simplest usecase of ARLBench.
We discuss the structure of ARLBenchmark, the currently supported objectives, the environment subsets and search spaces we provide and the seeding of the experiments in their own subpages.
The most important question, however, is how to actually use ARLBench in your experiments. This is the workflow we propose:

1. Decide which RL algorithms you choose as your HPO targets. In the best case, you will use all three: PPO, DQN and SAC.
2. Decide which AutoRL methods you want to benchmark.
3. Decide which objectives you want to optimize for. We provide a variety of objectives you can select one or more from.
4. Use the pre-defined search spaces to run your AutoRL method for several runs. If there is a good reason to deviate from these search spaces, please report this alongside your results.
5. Evaluate the best found configuration on the environment test seeds and report this result.

This page is under construction.
2 changes: 2 additions & 0 deletions docs/basic_usage/objectives.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Objectives in ARLBench
======================
2 changes: 2 additions & 0 deletions docs/basic_usage/options.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ARLBench Options
================
2 changes: 2 additions & 0 deletions docs/basic_usage/seeding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Considerations for Seeding
============================
41 changes: 0 additions & 41 deletions docs/commands.rst

This file was deleted.

1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ Home
examples/index
basic_usage/index
advanced_usage/index
commands
api
glossary
faq
Expand Down
4 changes: 2 additions & 2 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ Either way, you will likely want to first create a virtual environment to instal
conda create -n arlbench python=3.10
conda activate arlbench
1. Using PyPI
### 1. Using PyPI

This is the simplest way to install ARLBench. Just run the following command:

.. code-block:: bash
pip install arlbench
2. Downloading from GitHub
### 2. Downloading from GitHub

This is the best way to install the latest version of ARLBench. First, clone the repository:

Expand Down
126 changes: 124 additions & 2 deletions examples/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The "hypersweeper_tuning" and "schedules" notebooks can help you run these examp

## 1. Black-Box HPO

We use the 'hypersweeper' package to demonstrate how ARLBench can be used for black-box HPO. Since it's hydra-based, we simply set up a script which takes a configuration, runs it and returns the evaluation reward at the end. First, use pip to install the hypersweeper:
We use the [Hypersweeper](https://github.com/automl/hypersweeper/) package to demonstrate how ARLBench can be used for black-box HPO. Since it's hydra-based, we simply set up a script which takes a configuration, runs it and returns the evaluation reward at the end. First, use pip to install the hypersweeper:

```bash
pip install hypersweeper
Expand All @@ -34,6 +34,93 @@ Finally, we can also use the state-of-the-art SMAC optimizer by changing to the
python run_arlbench.py --config-name=smac -m
```

You can switch between the environments and algorithms in ARLBench by specifying it in the command line like this:

```bash
python run_arlbench.py --config-name=smac -m environment=cc_cartpole algorithm=ppo search_space=ppo_cc
```

You can see what exactly this command changes by looking into the example configs. In 'configs/algorithm/ppo.yaml', for example, we see the following:

```yaml
# @package _global_
algorithm: ppo
hp_config:
clip_eps: 0.2
ent_coef: 0.0
gae_lambda: 0.95
gamma: 0.99
learning_rate: 0.0003
max_grad_norm: 0.5
minibatch_size: 64
n_steps: 128
normalize_advantage: true
normalize_observations: false
update_epochs: 10
vf_clip_eps: 0.2
vf_coef: 0.5
nas_config:
activation: tanh
hidden_size: 64
```
These are the default arguments for the PPO algorithm in ARLBench. You can also override each of these individually, if you for example want to try a different value for gamma:
```bash
python run_arlbench.py --config-name=smac -m environment=cc_cartpole algorithm=ppo search_space=ppo_cc hp_config.gamma=0.8
```

The search space specification works very similarly via a yaml file, 'configs/search_space/cc_ppo.yaml' contains:
```yaml
seed: 0
hyperparameters:
hp_config.learning_rate:
type: uniform_float
upper: 0.1
lower: 1.0e-06
log: true
hp_config.ent_coef:
type: uniform_float
upper: 0.5
lower: 0.0
log: false
hp_config.minibatch_size:
type: categorical
choices: [128, 256, 512]
hp_config.gae_lambda:
type: uniform_float
upper: 0.9999
lower: 0.8
log: false
hp_config.clip_eps:
type: uniform_float
upper: 0.5
lower: 0.0
log: false
hp_config.vf_clip_eps:
type: uniform_float
upper: 0.5
lower: 0.0
log: false
hp_config.normalize_advantage:
type: categorical
choices: [True, False]
hp_config.vf_coef:
type: uniform_float
upper: 1.0
lower: 0.0
default: 0.5
log: false
hp_config.max_grad_norm:
type: uniform_float
upper: 1.0
lower: 0.0
log: false
```
This config sets a seed for the search space as well as lists the hyperparameters to configure with the values they can take. This way we can configure the full HPO setting with only yaml files to make the process easy to follow and simple to document for others.
## 2. Heuristic Schedules
We can also use ARLBench to dynamically change the hyperparameter config. We provide a simple example for this in 'run_heuristic_schedule.py': as soon as the agent improves over a certain reward threshold, we decrease the exploration epsilon in DQN a bit. This is likely not the best approach in practice, so feel free to play around with this idea! To see the result, run:
Expand All @@ -42,10 +129,45 @@ We can also use ARLBench to dynamically change the hyperparameter config. We pro
python run_heuristic_schedule.py
```

Since we now run ARLBench dynamically, we have to think of another configuration option: the settings for dynamic execution. This is configured using the 'autorl' set of keys, our general settings for ARLBench. The default version looks like this:

```yaml
autorl:
seed: 42
env_framework: ${environment.framework}
env_name: ${environment.name}
env_kwargs: ${environment.kwargs}
eval_env_kwargs: ${environment.eval_kwargs}
n_envs: ${environment.n_envs}
algorithm: ${algorithm}
cnn_policy: ${environment.cnn_policy}
nas_config: ${nas_config}
n_total_timesteps: ${environment.n_total_timesteps}
checkpoint: []
checkpoint_name: "default_checkpoint"
checkpoint_dir: "/tmp"
state_features: []
objectives: ["reward_mean"]
optimize_objectives: "upper"
n_steps: 1
n_eval_steps: 100
n_eval_episodes: 10
```
As you can see, most of the defaults are decided by the environment and algorithm we choose. For dynamic execution, we are interested in the 'n_steps' and 'n_total_timesteps' keys.
'n_steps' decides how many steps should be taken in the AutoRL Environment - in other words, how many schedule intervals we'd like to have. The 'n_total_timesteps' key then decides the length of each interval.
In the current config, we do a single training interval consisting of the total number of environment steps suggested for our target domain. If we want to instead run a schedule of length 10 with each schedule segment taking 10e4 steps, we can change the configuration like this:
```bash
python run_heuristic_schedule.py autorl.n_steps=10 autorl.total_timesteps=10000
```

## 3. Reactive Schedules

Lastly, we can also adjust the hyperparameters based on algorithm statistics. In 'run_reactive_schedule.py' we spike the learning rate if we see the gradient norm stagnating. See how it works by running:

```bash
python run_reactive_schedule.py
```
```

To actually configure to w
2 changes: 1 addition & 1 deletion examples/configs/base.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,6 @@ autorl:
state_features: []
objectives: ["reward_mean"]
optimize_objectives: "upper"
n_steps: 10
n_steps: 1
n_eval_steps: 100
n_eval_episodes: 10
27 changes: 17 additions & 10 deletions examples/hypersweeper_tuning.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" import hypersweeper\n",
"except ImportError as e:\n",
" %pip install hypersweeper\n",
"try:\n",
" import pandas as pd\n",
"except ImportError as e:\n",
" %pip install pandas\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -575,16 +592,6 @@
"!python run_arlbench.py --config-name=random_search --multirun"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
27 changes: 15 additions & 12 deletions examples/schedules.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"try:\n",
" import seaborn as sns\n",
"except ImportError as e:\n",
" %pip install seaborn\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -63,18 +78,6 @@
"!python run_heuristic_schedule.py"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 27,
Expand Down

0 comments on commit f580646

Please sign in to comment.