To start a train session, once installed:
python Run.py
Defaults:
Agent=Agents.AC2Agent
task=atari/pong
Plots, logs, generated images, and videos are automatically stored in: ./Benchmarking
.
Welcome ye, weary Traveller.
Stop here and rest at our local tavern,
Where all your reinforcements and supervisions be served, Γ la carte!
Drink up! π»
For detailed documentation, see our π.
@article{UnifiedML,
title = {UnifiedML: A Unified Framework For Intelligence Training},
author = {Sam Lerman, Chenliang Xu},
howpublished = {https://github.com/AGI-init/UnifiedML-legacy},
year = {2023}
}
If you use this work, please give us a star β and be sure to cite the above!
An acknowledgment to Denis Yarats, whose excellent DrQV2 repo inspired much of this library and its design.
Yes.
Our AC2Agent
supports discrete and continuous control, classification, generative modeling, and more.
See example scripts of various configurations below.
Let's get to business.
git clone [email protected]:agi-init/UnifiedML-legacy.git
cd UnifiedML-legacy
All dependencies can be installed via Conda:
conda env create --name ML --file=Conda.yml
conda activate ML
β Depending on your CUDA version, you may need to redundantly install Pytorch with CUDA from pytorch.org/get-started after activating your Conda environment.
For example, for CUDA 11.6:
pip uninstall torch torchvision torchaudio pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
A collection of retro Atari games.
You can install via AutoROM
if you accept the license. First install AutoROM
.
pip install autorom
Then accept the license.
AutoROM --accept-license
Comes pre-installed! For any issues, consult the DMC repo.
Video of different tasks in action.
Eight different ladybug species in the iNaturalist dataset.
All datasets come ready-to-use β
That's it.
π‘ Train Atari example:
python Run.py task=atari/mspacman
π‘ Train DMC example:
python Run.py task=dmc/cheetah_run
π‘ Train Classify example:
python Run.py task=classify/mnist
Run.py
handles learning and evaluation loops, saving, distributed training, logging, plotting.
Environment.py
handles rollouts.
./Agents
contains self-contained agents.
π Click to interact
Train DQN Agent
to play Ms. Pac-Man:
python Run.py task=atari/mspacman Agent=Agents.DQNAgent
- Our implementation expands on ensemble Q-learning with data regularization and Soft-DQN.
- Original Nature DQN paper.
βββββ
Humanoid from pixels with DrQV2 Agent
, a state of the art algorithm for continuous control from images:
python Run.py task=dmc/humanoid_walk Agent=Agents.DrQV2Agent
ββ ββ β
Play Super Mario Bros. with Dueling DQN Agent
, an extension of DQN that uses dueling Q networks:
python Run.py task=mario Agent=Agents.DuelingDQNAgent
β’β½βΌβ»βΊβΊβ»βΌβ½β½βΌβ§ βΌ π₯Έ β½ β§βΌβ½β½βΌβ»βΊβΊβ»βΌβ½β’
The library's default Agent is our AC2 Agent
(Agent=Agents.AC2Agent
).
python Run.py
+agent.depth=5
can activate a self-supervisor to predict temporal dynamics for a number of timesteps ahead, similar to Dreamer and SPR.+agent.num_actors=5 +agent.num_critics=5
can activate actor-critic ensembling.
In addition to RL, this agent supports classification, generative modeling, and various modes. Therefore we refer to it as a framework, not just an agent. The full array of the library's features and cross-domain compatibilities are supported by this agent.
β½βΌβ»βΊβΊβ»βΌβ½β½βΌβ»βΊβΊβ»βΌβ½β½βΌβ»βΊβΊβ»βΌβ½β½βΌβ»βΊβΊβ»βΌβ½
Save videos with vlog=true
.
π¬ π₯ -> Benchmarking/<experiment>/<agent>/<suite>/<task>_<seed>_Video_Image/
Check out args.yaml for the full array of configurable options available, including
- N-step rewards (
nstep=
) - Frame stack (
frame_stack=
) - Action repeat (
action_repeat=
) - & more, with per-task defaults in
/Hyperparams/task
β please share your hyperparams if you discover new or better ones!
β If you'd like to discretize a continuous domain, pass in discrete=true
and specify the number of discrete bins per action dimension via num_actions=
. If you'd like to continuous-ize a discrete domain, pass in discrete=false
. Action space conversions are experimental.
π‘ The below sections describe many features in other domains, but chances are those features will work in RL as well. For example, a cosine annealing learning rate schedule can be toggled with:
lr_decay_epochs=100
. It will anneal per-episode rather than per-epoch. Different model architectures, image transforms, EMAs, and more are all supported across domains!The vast majority of this hasn't been tested outside of its respective domain (CV, RL, etc.), so the research opportunity is a lot!
π Click to categorize
CNN on MNIST:
python Run.py task=classify/mnist
Note: RL=false
is the default for classify
tasks. Keeps training at standard supervised-only classification.
Variations
Since this is UnifiedML, there are a couple noteworthy variations. You can ignore these if you are only interested in standard classification via cross-entropy supervision only.
-
With
RL=true
, an augmented RL update joins the supervised learning update$\text{s.t. } reward = -error$ (experimental). -
Alternatively, and interestingly,
supervise=false RL=true
will only supervise via RL$reward = -error$ . This is pure-RL training and actually works!
Classify environments can actually be great testbeds for certain RL problems since they give near-instant and clear performance feedback.
Ignore these variations for doing standard classification.
Important features
Many popular features are unified in this library and generalized across RL/CV/generative domains, with more being added:
-
Evaluation with exponential moving average (EMA) of params can be toggled with the
ema=true
flag; customize the decay rate withema_decay=
. -
See Custom Architectures for mix-and-matching custom or pre-defined (e.g. ViT, ResNet50) architectures via the command line syntax.
-
Different optimizations can be configured too.
-
As well as Custom Datasets.
-
Ensembling is supported (e.g.,
+agent.num_actors=
) -
Training with weight decay can be toggled via
weight_decay=
. -
A cosine annealing learning rate schedule can be applied for
$N$ epochs (or episodes in RL) withlr_decay_epochs=
. -
And TorchVision transforms can be passed in as dicts via
transform=
.
For example,
python Run.py task=classify/cifar10 weight_decay=0.01 transform="{RandomHorizontalFlip:{p:0.5}}" Eyes=Blocks.Architectures.ResNet18
The above returns a task=
and Eyes=
of the above script.
And if you set supervise=false RL=true
, we get about the same score... vis-Γ -vis pure-RL.
This library is meant to be useful for academic research, and out of the box supports many datasets, including
- Tiny-ImageNet (
task=classify/tinyimagenet
), - iNaturalist, (
task=classify/inaturalist
), - CIFAR-100 (
task=classify/cifar100
), - & more, normalized and no manual preparation needed
π Click to synth
Via the generate=true
flag:
python Run.py task=classify/mnist generate=true
Synthesized MNIST images, conjured up and imagined by a simple MLP.
Saves to ./Benchmarking/<experiment>/<Agent name>/<task>_<seed>_Video_Image/
.
Defaults can be easily modified with custom architectures or even datasets as elaborated in Custom Architectures and Custom Datasets. Let's try the above with a CNN Discriminator:
python Run.py task=classify/mnist generate=true Discriminator=CNN +agent.num_critics=1
+agent.num_critics=1
uses only a single Discriminator rather than ensembling as is done in RL. See How Is This Possible? for more details on the unification.
Or a ResNet18:
python Run.py task=classify/mnist generate=true Discriminator=ResNet18
Let's speed up training by turning off the default image augmentation, which is overkill anyway for this simple case:
python Run.py task=classify/mnist generate=true Aug=Identity +agent.num_critics=1
Aug=Identity
substitutes the default random cropping image-augmentation with the Identity function, thereby disabling it.
Generative mode implicitly treats training as offline, and assumes a replay is saved that can be loaded. As long as a dataset is available or a replay has been saved, generate=true
will work for any defined visual task, making it a powerful hyper-parameter that can just work. For now, only visual (image) tasks are compatible.
Can even work with RL tasks (due to frame stack, the generated images are technically multi-frame videos).
python Run.py task=atari/breakout generate=true
Make sure you have saved a replay that can be loaded before doing this.
π Click to remember
Agents are automatically saved at the end of training:
python Run.py train_steps=2
Agents can be saved periodically and/or loaded with the save_per_steps=
or load=true
flags respectively:
# Saves periodically
python Run.py save_per_steps=100000
# Load
python Run.py load=true
Agents may be trained without saving by adding the save=false
flag.
An experience replay can be saved and/or loaded with the replay.save=true
or replay.load=true
flags.
# Save
python Run.py replay.save=true
# Load
python Run.py replay.load=true
Online tasks, such as online RL, will create a new replay if replay.load=false
, or β careful β potentially delete the current replay at the end of training if replay.save=false
.
By default, classify tasks are offline, meaning you don't have to worry about loading or saving replays. Since the dataset is static, creating/loading is handled automatically.
Click here to learn more about replays
In UnifiedML, replays are an efficient accelerated storage format for data that support both static and dynamic (changing/growing) datasets.
You can disable the use of replays with stream=true
, which just sends data to the Agent directly from the environment. In RL, this is equivalent to on-policy training. In classification, it means you'll just directly use the Pytorch Dataset, without all the fancy replay features and accelerations.
Replays are recommended for RL because on-policy algorithmic support is currently limited.
~
Agents and replays save to ./Checkpoints
and ./Datasets/ReplayBuffer
respectively per a unique experiment, otherwise overwriting.
A unique experiment is distinguished by the flags: experiment=
, Agent=
, suite=
, task_name=
, and seed=
.
You can change the Agent load/save path with load_path=
/save_path=
, and replay.path=
for replays. All three accept string paths e.g. load_path='./Checkpoints/Exp/AC2Agent/classify/MNIST_1.pt'
.
π Click to play retroactively
Offline means the dataset size doesn't grow.
From a saved experience replay, sans additional rollouts:
python Run.py task=atari/breakout offline=true
Assumes a replay is saved.
Implicitly treats replay.load=true
and replay.save=true
, and only does learning updates and evaluation rollouts.
offline=true
is the default for classification, where datasets are automatically downloaded and created into offline replays.
π Click to de-centralize
The simplest way to do distributed training is to use the parallel=true
flag,
python Run.py parallel=true
which automatically parallelizes the Encoder's "Eyes" across all visible GPUs. The Encoder is usually the most compute-intensive architectural portion.
To share whole agents across multiple parallel instances and/or machines,
Click to expand π
you can use the load_per_steps=
flag.
For example, a data-collector agent and an update agent,
python Run.py learn_per_steps=0 replay.save=true load_per_steps=1
python Run.py offline=true replay.offline=false replay.save=true replay.load=true save_per_steps=2
in concurrent processes.
Since both use the same experiment name, they will save and load from the same agent and replay, thereby emulating distributed training. Just make sure the replay from the first script is created before launching the second script. Highly experimental!
Here is another example of distributed training, via shared replays:
python Run.py replay.save=true
Then, in a separate process, after that replay has been created:
python Run.py replay.load=true replay.save=true
π Click to construct
A rich and expressive command line syntax is available for selecting and customizing architectures such as those defined in ./Blocks/Architectures
.
ResNet18 on CIFAR-10:
python Run.py task=classify/cifar10 Eyes=ResNet18
Atari with ViT:
python Run.py Eyes=ViT +eyes.patch_size=7
Shorthands like Aug
, Eyes
, and Pool
make it easy to plug and play custom architectures. All of an agent's architectural parts can be accessed, mixed, and matched with their corresponding recipe shorthand names.
Generally, the rule of thumb is Capital names for paths to classes (such as Eyes=Blocks.Architectures.MLP
) and lowercase names for shortcuts to tinker with model args (such as +eyes.depth=1
).
Architectures imported in Blocks/Architectures/__init__.py can be accessed directly without need for entering their full paths, as in Eyes=ViT
works just as well as Eyes=Blocks.Architectures.ViT
.
See more examples π
CIFAR-10 with ViT:
python Run.py Eyes=ViT task=classify/cifar10 ema=true weight_decay=0.01 +eyes.depth=6 +eyes.out_channels=512 +eyes.mlp_hidden_dim=512 transform="{RandomCrop:{size:32,padding:4},RandomHorizontalFlip:{}}" Aug=Identity
Here is a more complex example, disabling the Encoder's flattening of the feature map, and instead giving the Actor and Critic unique Attention Pooling operations on their trunks to pool the unflattened features. The Identity
architecture disables that flattening component.
python Run.py task=classify/mnist Q_trunk=Transformer Pi_trunk=Transformer Pool=Identity
Here is a nice example of the critic using a small CNN for downsampling features:
python Run.py task=dmc/cheetah_run Q_trunk=CNN +q_trunk.depth=1 pool=Identity
A CNN Actor and Critic:
python Run.py Q_trunk=CNN Pi_trunk=CNN +q_trunk.depth=1 +pi_trunk.depth=1 Pool=Identity
A little secret, but pytorch code can be passed directly too via quotes:
python Run.py "eyes='CNN(kwargs.input_shape,32,depth=3)'"
python Run.py "eyes='torch.nn.Conv2d(kwargs.input_shape[0],32,kernel_size=3)'"
Some blocks have default args which can be accessed with the kwargs.
interpolation shown above.
An intricate example of the expressiveness of this syntax:
python Run.py Optim=SGD 'Pi_trunk="nn.Sequential(MLP(input_shape=kwargs.input_shape, output_shape=kwargs.output_shape),nn.ReLU(inplace=True))"' lr=0.01
Both the uppercase and lowercase syntax support direct function calls in place of usual syntax, with function calls distinguished by the syntactical quotes and parentheticals.
The parser automatically registers the imports/class paths in Utils.
in both the uppercase and lowercase syntax, including modules/classes torch
, torch.nn
, and architectures/paths in ./Blocks/Architectures/
like CNN
for direct access and no need to type Utils.
.
To make a custom architecture, you can use any Pytorch module which outputs a tensor. Woohoo, done.
To make it mix-and-matchable throughout UnfiedML for arbitrary dimensionalities and domains, to generalize as much as possible, you can add:
input_shape
andoutput_shape
arguments to the __init__ method, such that your architecture can have a defined adaptation scheme for different possible shapes.- Support arbitrary many inputs (such as by concatenating them) of weird shapes (broadcasting them).
- A
repr_shape(*_)
method that pre-computes the output shape given a varying-number of input shape dimensions as arguments.
None of these add-ons are necessary, but if you include all of them, then your architecture can adapt to everything. There are lazy ways to hack all of these features into any architecture, or you can follow the pretty basic templates used in our existing array of architectures. Most of our architectures can probably be used to build whatever architecture youβre trying to build, honestly, or at least something similar enough that you could have a good jumping-off point.
In short: To make your own architecture mix-and-matchable, just put it in a pytorch module with initialization options for input_shape
and output_shape
, as in the architectures in ./Blocks/Architectures
.
The Encoder Eyes automatically adapt 2d conv to 1d conv by the way (if data is 1d).
π Click to search/explore
You can pass in a path to the Optim=
flag or select a built-in Pytorch optimizer like SGD
, or both as below:
python Run.py Optim=Utils.torch.optim.SGD lr=0.1
Equivalently via the expressive recipe interface:
python Run.py Optim=SGD lr=0.1
or
python Run.py "optim='torch.optim.SGD(kwargs.params, lr=0.1)'"
In the first two examples, the lr=
flag was optional. The default learning rate is 1e-4
and we could have writen +optim.lr=
.
Per-block optimizers For example, just the Encoder:
python Run.py encoder.Optim=SGD
Learning rate schedulers. Scheduler=
works analogously to Optim=
, or just use the lr_decay_epochs=
shorthand for cosine annealing e.g.
python Run.py task=classify/mnist lr_decay_epochs=100
π Click to let there be light
As an example of custom environments, we provide the Super Mario Bros. game environment in ./Datasets/Suites/SuperMario.py.
To use it, you can just pass in the path to Env=
and specify the suite
and the task_name
to your choosing:
python Run.py Env=Datasets.Suites.SuperMario.SuperMario suite=SuperMario task_name=Mario
Any Hyperparams you don't specify will be inherited from the default task, atari/pong
in ./Hyperparams/task/atari/pong.yaml
, or whichever task is selected.
β If you want to save Hyperparams and formally define a task, you can create files like ./Hyperparams/task/mario.yaml
in the ./Hyperparams/task/ directory:
# ./Hyperparams/task/mario.yaml
defaults:
- _self_
Env: Datasets.Suites.SuperMario.SuperMario
suite: SuperMario
task_name: Mario
discrete: true
action_repeat: 4
truncate_episode_steps: 250
nstep: 3
frame_stack: 4
train_steps: 3000000
stddev_schedule: 'linear(1.0,0.1,800000)'
Now you can launch Mario with:
python Run.py task=mario
You can also customize params and worlds and stages with the +env.
syntax:
python Run.py task=mario +env.stage=2
π Click to read, parse, & boot up
You can pass in any Dataset as follows:
python Run.py task=classify/custom Dataset=torchvision.datasets.MNIST
That will launch MNIST. Another example, with a custom class and path,
python Run.py task=classify/custom Dataset=Datasets.Suites._TinyImageNet.TinyImageNet
This will initiate a classify task on the custom-defined TinyImageNet
Dataset.
You can change the task name as it's saved for benchmarking and plotting, with task_name=
. The default is the class name such as TinyImageNet
.
UnifiedML is compatible with datasets & domains besides Vision.
Thanks to dimensionality adaptivity (slide 12) for example, train the default CNN architecture on raw 1D Audio:
python Run.py task=classify/custom Dataset=Datasets.Suites._SpeechCommands.SpeechCommands Aug=Identity
Gets a perfect score on speech command classification from raw 1D audio with the default CNN setting.
More details and examples π
For a non-Vision/Audio tutorial, we provide a full end-to-end example in Crystal classification, reproducing classifying crystal structures and space groups from X-ray diffraction patterns.
Note: You can also specify an independent test dataset explicitly with TestDataset=
.
π Learn to cook
Save hyperparams to .yaml
files by defining them in the ./Hyperparams/task/ directory. There are many saved examples already.
If you've defined a .yaml
file called my_recipe.yaml
for example, you can use it via
python Run.py task=my_recipe
Please share your recipes in our Discussions page if you discover new or better hyperparams for a problem.
Recipes can also be defined temporarily via command line without saving them to .yaml files.
Below is a running list of some out-of-the-ordinary or interesting ones:
python Run.py Eyes=Sequential +eyes._targets_="[CNN, Transformer]" task=classify/mnist
python Run.py task=classify/mnist Pool=Sequential +pool._targets_="[Transformer, AvgPool]" +pool.positional_encodings=false
python Run.py task=classify/mnist Pool=Residual +pool.model=Transformer +pool.depth=2
python Run.py task=classify/mnist Pool=Sequential +pool._targets_="[ChannelSwap, Residual]" +'pool.model="MLP(kwargs.input_shape[-1])"' +'pool.down_sample="MLP(input_shape=kwargs.input_shape[-1])"'
python Run.py task=classify/mnist Pool=RN
python Run.py task=classify/mnist Pool=Sequential +pool._targets_="[RN, AvgPool]"
python Run.py task=classify/mnist Eyes=Perceiver +eyes.depths="[3, 3, 2]" +eyes.num_tokens=128
python Run.py task=classify/mnist Predictor=Perceiver +predictor.token_dim=32
python Run.py task=classify/mnist Predictor=Perceiver train_steps=2
python Run.py task=dmc/cheetah_run Predictor=load +predictor.path=./Checkpoints/Exp/DQNAgent/classify/MNIST_1.pt +predictor.attr=actor.Pi_head +predictor.device=cpu save=false
python Run.py task=classify/mnist Eyes=Identity Predictor=Perceiver +predictor.depths=10
python Run.py Aug=Sequential +aug._targets_="[IntensityAug, RandomShiftsAug]" +aug.scale=0.05 aug.pad=4
These are also useful for testing whether I've broken things.
π Click to see
Plots automatically save to ./Benchmarking/<experiment>/
; the default experiment is experiment=Exp
.
python Run.py
π π --> ./Benchmarking/Exp/
Optionally plot multiple experiments
python Run.py experiment=Exp2 plotting.plot_experiments="['Exp', 'Exp2']"
Alternatively, you can call Plot.py
directly
python Plot.py plot_experiments="['Exp', 'Exp2']"
to generate plots. Here, the <experiment>
directory name will be the underscore_concatenated union of all experiment names ("Exp_Exp2
").
Plotting also accepts regex expressions. For example, to plot all experiments with Exp
in the name:
python Plot.py plot_experiments="['Exp.*']"
Another option is to use WandB, which is supported by UnifiedML:
python Run.py logger.wandb=true
You can connect UnifiedML to your WandB account by first running wandb login
in your Conda environment.
To do a hyperparameter sweep, just use the -m
flag.
python Run.py -m task=atari/pong,classify/mnist seed=1,2,3
Log video during evaluations with log_media=true
.
π Click to write your own paper
We have released our slide deck!
Feel free to use our UnifiedML templates and figures in your work, citing us of course.
Open-source research for minimal redundancy and optimal standardization is the way to go, balancing privacy and de-centrality, and streamlining successive works that depend on ours in good faith. Post your own designs and assets here in the discussion board. Read the rules to keep citations and credit attribution fair.
Atari
We can attain 100% mean human-normalized score across the Atari-26 benchmark suite in about 1m environment steps.
The below example script shows how to launch training for just Pong and Breakout with AC2Agent
:
python Run.py task=atari/pong,atari/breakout -m
The results are reported for all 26 games and 3 different agents:
We found these results to be pretty stable across a range of exploration rates as well:
Each time point averages over 10 evaluation episodes (and 26 games).
DCGAN
The simplest way to do DCGAN is to use the DCGAN architecture:
python Run.py task=classify/celeba generate=true Discriminator=DCGAN.Discriminator Generator=DCGAN.Generator train_steps=50000
We can then improve the results, and speed up training tenfold, by modifying the hyperparameters:
python Run.py task=classify/celeba generate=true Discriminator=DCGAN.Discriminator Generator=DCGAN.Generator z_dim=100 Aug=Identity Optim=Adam '+optim.betas=[0.5, 0.999]' lr=2e-4 +agent.num_critics=1 train_steps=5000
We use our new Creator framework to unify RL discrete and continuous action spaces, as elaborated in our paper.
Then we frame actions as "predictions" in supervised learning. We can even augment supervised learning with an RL phase, treating reward as negative error.
For generative modeling, well, it turns out that the difference between a Generator-Discriminator and Actor-Critic is rather nominal.
All files are designed for pedagogical clarity and extendability for research, to be useful for educational and innovational purposes, with simplicity at heart.
Please support financially by Sponsoring.
We are a nonprofit, single-PhD student team. If possible, compute resources appreciated.
Feel free to contact agi.__init__.
I am always looking for collaborators. Don't hesitate to volunteer in any way to help realize the full potential of this library.
Non-legacy version: here.