drafts for doc pages

automl · May 30, 2024 · f580646 · f580646
1 parent 1976f4c
commit f580646
Show file tree

Hide file tree

Showing 16 changed files with 189 additions and 75 deletions.
diff --git a/docs/advanced_usage/algorithm_states.rst b/docs/advanced_usage/algorithm_states.rst
@@ -0,0 +1,2 @@
+Using the ARLBench States
+==========================
diff --git a/docs/advanced_usage/autorl_paradigms.rst b/docs/advanced_usage/autorl_paradigms.rst
@@ -0,0 +1,2 @@
+ARLBench and Different AutoRL Paradigms
+=======================================
diff --git a/docs/advanced_usage/dynamic_configuration.rst b/docs/advanced_usage/dynamic_configuration.rst
@@ -0,0 +1,2 @@
+Dynamic Configuration in ARLBench
+==================================
diff --git a/docs/advanced_usage/index.rst b/docs/advanced_usage/index.rst
@@ -1,5 +1,5 @@
-Advanced Usage
-==============
+Advanced Configuration Options
+==============================
 
 .. toctree::
    :hidden:

diff --git a/docs/basic_usage/env_subsets.rst b/docs/basic_usage/env_subsets.rst
@@ -0,0 +1,2 @@
+The ARLBench Subsets
+====================
diff --git a/docs/basic_usage/index.rst b/docs/basic_usage/index.rst
@@ -1,11 +1,21 @@
-Basic Usage
-===========
+Benchmarking AutoRL Methods
+============================
 
 .. toctree::
    :hidden:
    :maxdepth: 2
+   objectives
+   env_subsets
+   seeding
 
 
-.. warning::
+ARLBench provides an basis for benchmarking different AutoRL methods. This section of the documentation focuses on the prominent aspect of black-box hyperparameter optimization, since it's the simplest usecase of ARLBench.
+We discuss the structure of ARLBenchmark, the currently supported objectives, the environment subsets and search spaces we provide and the seeding of the experiments in their own subpages. 
+The most important question, however, is how to actually use ARLBench in your experiments. This is the workflow we propose:
+
+1. Decide which RL algorithms you choose as your HPO targets. In the best case, you will use all three: PPO, DQN and SAC.
+2. Decide which AutoRL methods you want to benchmark. 
+3. Decide which objectives you want to optimize for. We provide a variety of objectives you can select one or more from.
+4. Use the pre-defined search spaces to run your AutoRL method for several runs. If there is a good reason to deviate from these search spaces, please report this alongside your results.
+5. Evaluate the best found configuration on the environment test seeds and report this result.
 
-    This page is under construction.
diff --git a/docs/basic_usage/objectives.rst b/docs/basic_usage/objectives.rst
@@ -0,0 +1,2 @@
+Objectives in ARLBench
+======================
diff --git a/docs/basic_usage/options.rst b/docs/basic_usage/options.rst
@@ -0,0 +1,2 @@
+ARLBench Options
+================
diff --git a/docs/basic_usage/seeding.rst b/docs/basic_usage/seeding.rst
@@ -0,0 +1,2 @@
+Considerations for Seeding
+============================
diff --git a/docs/commands.rst b/docs/commands.rst
diff --git a/docs/index.rst b/docs/index.rst
@@ -9,7 +9,6 @@ Home
    examples/index
    basic_usage/index
    advanced_usage/index
-   commands
    api
    glossary
    faq

diff --git a/docs/installation.rst b/docs/installation.rst
@@ -9,15 +9,15 @@ Either way, you will likely want to first create a virtual environment to instal
     conda create -n arlbench python=3.10
     conda activate arlbench
 
-1. Using PyPI
+### 1. Using PyPI
 
 This is the simplest way to install ARLBench. Just run the following command:
 
 .. code-block:: bash
 
     pip install arlbench
 
-2. Downloading from GitHub
+### 2. Downloading from GitHub
 
 This is the best way to install the latest version of ARLBench. First, clone the repository:
 

diff --git a/examples/Readme.md b/examples/Readme.md
@@ -10,7 +10,7 @@ The "hypersweeper_tuning" and "schedules" notebooks can help you run these examp
 
 ## 1. Black-Box HPO
 
-We use the 'hypersweeper' package to demonstrate how ARLBench can be used for black-box HPO. Since it's hydra-based, we simply set up a script which takes a configuration, runs it and returns the evaluation reward at the end. First, use pip to install the hypersweeper:
+We use the [Hypersweeper](https://github.com/automl/hypersweeper/) package to demonstrate how ARLBench can be used for black-box HPO. Since it's hydra-based, we simply set up a script which takes a configuration, runs it and returns the evaluation reward at the end. First, use pip to install the hypersweeper:
 
 ```bash
 pip install hypersweeper
@@ -34,6 +34,93 @@ Finally, we can also use the state-of-the-art SMAC optimizer by changing to the
 python run_arlbench.py --config-name=smac -m
 ```
 
+You can switch between the environments and algorithms in ARLBench by specifying it in the command line like this:
+
+```bash
+python run_arlbench.py --config-name=smac -m environment=cc_cartpole algorithm=ppo search_space=ppo_cc
+```
+
+You can see what exactly this command changes by looking into the example configs. In 'configs/algorithm/ppo.yaml', for example, we see the following:
+
+```yaml
+# @package _global_
+algorithm: ppo
+hp_config:
+  clip_eps: 0.2
+  ent_coef: 0.0
+  gae_lambda: 0.95
+  gamma: 0.99
+  learning_rate: 0.0003
+  max_grad_norm: 0.5
+  minibatch_size: 64
+  n_steps: 128
+  normalize_advantage: true
+  normalize_observations: false
+  update_epochs: 10
+  vf_clip_eps: 0.2
+  vf_coef: 0.5
+nas_config:
+  activation: tanh
+  hidden_size: 64
+```
+
+These are the default arguments for the PPO algorithm in ARLBench. You can also override each of these individually, if you for example want to try a different value for gamma:
+
+```bash
+python run_arlbench.py --config-name=smac -m environment=cc_cartpole algorithm=ppo search_space=ppo_cc hp_config.gamma=0.8
+```
+
+The search space specification works very similarly via a yaml file, 'configs/search_space/cc_ppo.yaml' contains:
+```yaml
+seed: 0
+hyperparameters:
+  hp_config.learning_rate:
+    type: uniform_float
+    upper: 0.1
+    lower: 1.0e-06
+    log: true
+  hp_config.ent_coef:
+    type: uniform_float
+    upper: 0.5 
+    lower: 0.0
+    log: false
+  hp_config.minibatch_size:
+    type: categorical
+    choices: [128, 256, 512]
+  hp_config.gae_lambda:
+    type: uniform_float
+    upper: 0.9999
+    lower: 0.8
+    log: false
+  hp_config.clip_eps:
+    type: uniform_float
+    upper: 0.5
+    lower: 0.0
+    log: false
+  hp_config.vf_clip_eps:
+    type: uniform_float
+    upper: 0.5
+    lower: 0.0
+    log: false
+  hp_config.normalize_advantage:
+    type: categorical
+    choices: [True, False]
+  hp_config.vf_coef:
+    type: uniform_float
+    upper: 1.0
+    lower: 0.0
+    default: 0.5
+    log: false
+  hp_config.max_grad_norm:
+    type: uniform_float
+    upper: 1.0
+    lower: 0.0
+    log: false
+```
+
+This config sets a seed for the search space as well as lists the hyperparameters to configure with the values they can take. This way we can configure the full HPO setting with only yaml files to make the process easy to follow and simple to document for others.
+
+
 ## 2. Heuristic Schedules
 
 We can also use ARLBench to dynamically change the hyperparameter config. We provide a simple example for this in 'run_heuristic_schedule.py': as soon as the agent improves over a certain reward threshold, we decrease the exploration epsilon in DQN a bit. This is likely not the best approach in practice, so feel free to play around with this idea! To see the result, run:
@@ -42,10 +129,45 @@ We can also use ARLBench to dynamically change the hyperparameter config. We pro
 python run_heuristic_schedule.py
 ```
 
+Since we now run ARLBench dynamically, we have to think of another configuration option: the settings for dynamic execution. This is configured using the 'autorl' set of keys, our general settings for ARLBench. The default version looks like this:
+
+```yaml
+autorl:
+  seed: 42
+  env_framework: ${environment.framework}
+  env_name: ${environment.name}
+  env_kwargs: ${environment.kwargs}
+  eval_env_kwargs: ${environment.eval_kwargs}
+  n_envs: ${environment.n_envs}
+  algorithm: ${algorithm}
+  cnn_policy: ${environment.cnn_policy}
+  nas_config: ${nas_config}
+  n_total_timesteps: ${environment.n_total_timesteps}
+  checkpoint: []
+  checkpoint_name: "default_checkpoint"
+  checkpoint_dir: "/tmp"
+  state_features: []
+  objectives: ["reward_mean"]
+  optimize_objectives: "upper"
+  n_steps: 1
+  n_eval_steps: 100
+  n_eval_episodes: 10
+```
+
+As you can see, most of the defaults are decided by the environment and algorithm we choose. For dynamic execution, we are interested in the 'n_steps' and 'n_total_timesteps' keys. 
+'n_steps' decides how many steps should be taken in the AutoRL Environment - in other words, how many schedule intervals we'd like to have. The 'n_total_timesteps' key then decides the length of each interval.
+In the current config, we do a single training interval consisting of the total number of environment steps suggested for our target domain. If we want to instead run a schedule of length 10 with each schedule segment taking 10e4 steps, we can change the configuration like this:
+
+```bash
+python run_heuristic_schedule.py autorl.n_steps=10 autorl.total_timesteps=10000
+```
+
 ## 3. Reactive Schedules
 
 Lastly, we can also adjust the hyperparameters based on algorithm statistics. In 'run_reactive_schedule.py' we spike the learning rate if we see the gradient norm stagnating. See how it works by running:
 
 ```bash
 python run_reactive_schedule.py
-```
+```
+
+To actually configure to w
diff --git a/examples/configs/base.yaml b/examples/configs/base.yaml
@@ -32,6 +32,6 @@ autorl:
   state_features: []
   objectives: ["reward_mean"]
   optimize_objectives: "upper"
-  n_steps: 10
+  n_steps: 1
   n_eval_steps: 100
   n_eval_episodes: 10
diff --git a/examples/hypersweeper_tuning.ipynb b/examples/hypersweeper_tuning.ipynb
@@ -1,5 +1,22 @@
 {
  "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    import hypersweeper\n",
+    "except ImportError as e:\n",
+    "    %pip install hypersweeper\n",
+    "try:\n",
+    "    import pandas as pd\n",
+    "except ImportError as e:\n",
+    "    %pip install pandas\n",
+    "import numpy as np"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -575,16 +592,6 @@
     "!python run_arlbench.py --config-name=random_search --multirun"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "import numpy as np"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/schedules.ipynb b/examples/schedules.ipynb
@@ -1,5 +1,20 @@
 {
  "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "try:\n",
+    "    import seaborn as sns\n",
+    "except ImportError as e:\n",
+    "    %pip install seaborn\n",
+    "import matplotlib.pyplot as plt\n",
+    "%matplotlib inline"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -63,18 +78,6 @@
     "!python run_heuristic_schedule.py"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import json\n",
-    "import seaborn as sns\n",
-    "import matplotlib.pyplot as plt\n",
-    "%matplotlib inline"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 27,
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Using the ARLBench States
		==========================
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		ARLBench and Different AutoRL Paradigms
		=======================================
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Dynamic Configuration in ARLBench
		==================================
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Considerations for Seeding
		============================