Skip to content

AGI-init/UnifiedML-legacy

Repository files navigation

UnifiedMLBanner

Quick Links

πŸƒ Running The Code

To start a train session, once installed:

python Run.py

Defaults:

Agent=Agents.AC2Agent

task=atari/pong

Plots, logs, generated images, and videos are automatically stored in: ./Benchmarking.

ApeMan

Welcome ye, weary Traveller.

Stop here and rest at our local tavern,

Where all your reinforcements and supervisions be served, Γ  la carte!

Drink up! 🍻

πŸ–ŠοΈ Paper & Citing

For detailed documentation, see our πŸ“œ.

@article{UnifiedML,
  title   = {UnifiedML: A Unified Framework For Intelligence Training},
  author  = {Sam Lerman, Chenliang Xu},
  howpublished = {https://github.com/AGI-init/UnifiedML-legacy},
  year    = {2023}
}

If you use this work, please give us a star ⭐ and be sure to cite the above!

An acknowledgment to Denis Yarats, whose excellent DrQV2 repo inspired much of this library and its design.

β˜‚οΈ Unified Learning?

Yes.

Our AC2Agent supports discrete and continuous control, classification, generative modeling, and more.

See example scripts of various configurations below.

πŸ”§ Setting Up

Let's get to business.

1. Clone The Repo

git clone [email protected]:agi-init/UnifiedML-legacy.git
cd UnifiedML-legacy

2. Gemme Some Dependencies

All dependencies can be installed via Conda:

conda env create --name ML --file=Conda.yml

3. Activate Your Conda Env.

conda activate ML

β“˜ Depending on your CUDA version, you may need to redundantly install Pytorch with CUDA from pytorch.org/get-started after activating your Conda environment.

For example, for CUDA 11.6:

pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

πŸ•ΉοΈ Installing The Suites

1. Atari Arcade


A collection of retro Atari games.

You can install via AutoROM if you accept the license. First install AutoROM.

pip install autorom

Then accept the license.

AutoROM --accept-license

2. DeepMind Control

Comes pre-installed! For any issues, consult the DMC repo.

▢️ Click to play
Play video
Video of different tasks in action.

3. Classify


Eight different ladybug species in the iNaturalist dataset.

All datasets come ready-to-use βœ…

That's it.

πŸ’‘ Train Atari example: python Run.py task=atari/mspacman

πŸ’‘ Train DMC example: python Run.py task=dmc/cheetah_run

πŸ’‘ Train Classify example: python Run.py task=classify/mnist

πŸ—„οΈ Key files

Run.py handles learning and evaluation loops, saving, distributed training, logging, plotting.

Environment.py handles rollouts.

./Agents contains self-contained agents.

πŸ” Full Tutorials

RL

πŸ” Click to interact

Train DQN Agent to play Ms. Pac-Man:

python Run.py task=atari/mspacman Agent=Agents.DQNAgent

——❖——

Humanoid from pixels with DrQV2 Agent, a state of the art algorithm for continuous control from images:

python Run.py task=dmc/humanoid_walk Agent=Agents.DrQV2Agent

β‹†β‹…β˜†β‹…β‹†

Play Super Mario Bros. with Dueling DQN Agent, an extension of DQN that uses dueling Q networks:

python Run.py task=mario Agent=Agents.DuelingDQNAgent

β€’βŽ½βŽΌβŽ»βŽΊβŽΊβŽ»βŽΌβŽ½βŽ½βŽΌβœ§ ☼ π–₯Έ ☽ βœ§βŽΌβŽ½βŽ½βŽΌβŽ»βŽΊβŽΊβŽ»βŽΌβŽ½β€’

The library's default Agent is our AC2 Agent (Agent=Agents.AC2Agent).

python Run.py
  • +agent.depth=5 can activate a self-supervisor to predict temporal dynamics for a number of timesteps ahead, similar to Dreamer and SPR.
  • +agent.num_actors=5 +agent.num_critics=5 can activate actor-critic ensembling.

In addition to RL, this agent supports classification, generative modeling, and various modes. Therefore we refer to it as a framework, not just an agent. The full array of the library's features and cross-domain compatibilities are supported by this agent.

⎽⎼⎻⎺⎺⎻⎼⎽⎽⎼⎻⎺⎺⎻⎼⎽⎽⎼⎻⎺⎺⎻⎼⎽⎽⎼⎻⎺⎺⎻⎼⎽

Save videos with vlog=true.

🎬 πŸŽ₯ -> Benchmarking/<experiment>/<agent>/<suite>/<task>_<seed>_Video_Image/

Check out args.yaml for the full array of configurable options available, including

  • N-step rewards (nstep=)
  • Frame stack (frame_stack=)
  • Action repeat (action_repeat=)
  • & more, with per-task defaults in /Hyperparams/task β€” please share your hyperparams if you discover new or better ones!

β“˜ If you'd like to discretize a continuous domain, pass in discrete=true and specify the number of discrete bins per action dimension via num_actions=. If you'd like to continuous-ize a discrete domain, pass in discrete=false. Action space conversions are experimental.

πŸ’‘ The below sections describe many features in other domains, but chances are those features will work in RL as well. For example, a cosine annealing learning rate schedule can be toggled with: lr_decay_epochs=100. It will anneal per-episode rather than per-epoch. Different model architectures, image transforms, EMAs, and more are all supported across domains!

The vast majority of this hasn't been tested outside of its respective domain (CV, RL, etc.), so the research opportunity is a lot!

Classification

πŸ” Click to categorize

CNN on MNIST:

python Run.py task=classify/mnist 

Note: RL=false is the default for classify tasks. Keeps training at standard supervised-only classification.

Variations

Since this is UnifiedML, there are a couple noteworthy variations. You can ignore these if you are only interested in standard classification via cross-entropy supervision only.

  1. With RL=true, an augmented RL update joins the supervised learning update $\text{s.t. } reward = -error$ (experimental).

  2. Alternatively, and interestingly, supervise=false RL=true will only supervise via RL $reward = -error$. This is pure-RL training and actually works!

Classify environments can actually be great testbeds for certain RL problems since they give near-instant and clear performance feedback.

Ignore these variations for doing standard classification.

Important features

Many popular features are unified in this library and generalized across RL/CV/generative domains, with more being added:

For example,

python Run.py task=classify/cifar10 weight_decay=0.01 transform="{RandomHorizontalFlip:{p:0.5}}" Eyes=Blocks.Architectures.ResNet18

The above returns a $94$% on CIFAR-10 with a ResNet18, which is pretty good. Changing datasets/architectures is as easy as modifying the corresponding parts task= and Eyes= of the above script.

And if you set supervise=false RL=true, we get about the same score... vis-Γ -vis pure-RL.

This library is meant to be useful for academic research, and out of the box supports many datasets, including

  • Tiny-ImageNet (task=classify/tinyimagenet),
  • iNaturalist, (task=classify/inaturalist),
  • CIFAR-100 (task=classify/cifar100),
  • & more, normalized and no manual preparation needed

Generative Modeling

πŸ” Click to synth

Via the generate=true flag:

python Run.py task=classify/mnist generate=true


Synthesized MNIST images, conjured up and imagined by a simple MLP.

Saves to ./Benchmarking/<experiment>/<Agent name>/<task>_<seed>_Video_Image/.

Defaults can be easily modified with custom architectures or even datasets as elaborated in Custom Architectures and Custom Datasets. Let's try the above with a CNN Discriminator:

python Run.py task=classify/mnist generate=true Discriminator=CNN +agent.num_critics=1

+agent.num_critics=1 uses only a single Discriminator rather than ensembling as is done in RL. See How Is This Possible? for more details on the unification.

Or a ResNet18:

python Run.py task=classify/mnist generate=true Discriminator=ResNet18

Let's speed up training by turning off the default image augmentation, which is overkill anyway for this simple case:

python Run.py task=classify/mnist generate=true Aug=Identity +agent.num_critics=1

Aug=Identity substitutes the default random cropping image-augmentation with the Identity function, thereby disabling it.

Generative mode implicitly treats training as offline, and assumes a replay is saved that can be loaded. As long as a dataset is available or a replay has been saved, generate=true will work for any defined visual task, making it a powerful hyper-parameter that can just work. For now, only visual (image) tasks are compatible.

Can even work with RL tasks (due to frame stack, the generated images are technically multi-frame videos).

python Run.py task=atari/breakout generate=true

Make sure you have saved a replay that can be loaded before doing this.

Saving

πŸ” Click to remember

Agents are automatically saved at the end of training:

python Run.py train_steps=2

Agents can be saved periodically and/or loaded with the save_per_steps= or load=true flags respectively:

# Saves periodically
python Run.py save_per_steps=100000

# Load
python Run.py load=true

Agents may be trained without saving by adding the save=false flag.

An experience replay can be saved and/or loaded with the replay.save=true or replay.load=true flags.

# Save
python Run.py replay.save=true

# Load
python Run.py replay.load=true

Online tasks, such as online RL, will create a new replay if replay.load=false, or β€” careful β€” potentially delete the current replay at the end of training if replay.save=false.

By default, classify tasks are offline, meaning you don't have to worry about loading or saving replays. Since the dataset is static, creating/loading is handled automatically.

Click here to learn more about replays

flowchart

In UnifiedML, replays are an efficient accelerated storage format for data that support both static and dynamic (changing/growing) datasets.

You can disable the use of replays with stream=true, which just sends data to the Agent directly from the environment. In RL, this is equivalent to on-policy training. In classification, it means you'll just directly use the Pytorch Dataset, without all the fancy replay features and accelerations.

Replays are recommended for RL because on-policy algorithmic support is currently limited.

~

Agents and replays save to ./Checkpoints and ./Datasets/ReplayBuffer respectively per a unique experiment, otherwise overwriting.

A unique experiment is distinguished by the flags: experiment=, Agent=, suite=, task_name=, and seed=.

You can change the Agent load/save path with load_path=/save_path=, and replay.path= for replays. All three accept string paths e.g. load_path='./Checkpoints/Exp/AC2Agent/classify/MNIST_1.pt'.

Offline RL

πŸ” Click to play retroactively

Offline means the dataset size doesn't grow.

From a saved experience replay, sans additional rollouts:

python Run.py task=atari/breakout offline=true

Assumes a replay is saved.

Implicitly treats replay.load=true and replay.save=true, and only does learning updates and evaluation rollouts.

offline=true is the default for classification, where datasets are automatically downloaded and created into offline replays.

Distributed

πŸ” Click to de-centralize

The simplest way to do distributed training is to use the parallel=true flag,

python Run.py parallel=true 

which automatically parallelizes the Encoder's "Eyes" across all visible GPUs. The Encoder is usually the most compute-intensive architectural portion.

To share whole agents across multiple parallel instances and/or machines,

Click to expand πŸ“–

you can use the load_per_steps= flag.

For example, a data-collector agent and an update agent,

python Run.py learn_per_steps=0 replay.save=true load_per_steps=1
python Run.py offline=true replay.offline=false replay.save=true replay.load=true save_per_steps=2

in concurrent processes.

Since both use the same experiment name, they will save and load from the same agent and replay, thereby emulating distributed training. Just make sure the replay from the first script is created before launching the second script. Highly experimental!

Here is another example of distributed training, via shared replays:

python Run.py replay.save=true 

Then, in a separate process, after that replay has been created:

python Run.py replay.load=true replay.save=true 

Custom Architectures

πŸ” Click to construct

A rich and expressive command line syntax is available for selecting and customizing architectures such as those defined in ./Blocks/Architectures.

ResNet18 on CIFAR-10:

python Run.py task=classify/cifar10 Eyes=ResNet18 

Atari with ViT:

python Run.py Eyes=ViT +eyes.patch_size=7

Shorthands like Aug, Eyes, and Pool make it easy to plug and play custom architectures. All of an agent's architectural parts can be accessed, mixed, and matched with their corresponding recipe shorthand names.

Generally, the rule of thumb is Capital names for paths to classes (such as Eyes=Blocks.Architectures.MLP) and lowercase names for shortcuts to tinker with model args (such as +eyes.depth=1).

Architectures imported in Blocks/Architectures/__init__.py can be accessed directly without need for entering their full paths, as in Eyes=ViT works just as well as Eyes=Blocks.Architectures.ViT.

See more examples πŸ“–

CIFAR-10 with ViT:

python Run.py Eyes=ViT task=classify/cifar10 ema=true weight_decay=0.01 +eyes.depth=6 +eyes.out_channels=512 +eyes.mlp_hidden_dim=512 transform="{RandomCrop:{size:32,padding:4},RandomHorizontalFlip:{}}" Aug=Identity

Here is a more complex example, disabling the Encoder's flattening of the feature map, and instead giving the Actor and Critic unique Attention Pooling operations on their trunks to pool the unflattened features. The Identity architecture disables that flattening component.

python Run.py task=classify/mnist Q_trunk=Transformer Pi_trunk=Transformer Pool=Identity

Here is a nice example of the critic using a small CNN for downsampling features:

python Run.py task=dmc/cheetah_run Q_trunk=CNN +q_trunk.depth=1 pool=Identity

A CNN Actor and Critic:

python Run.py Q_trunk=CNN Pi_trunk=CNN +q_trunk.depth=1 +pi_trunk.depth=1 Pool=Identity

A little secret, but pytorch code can be passed directly too via quotes:

python Run.py "eyes='CNN(kwargs.input_shape,32,depth=3)'"
python Run.py "eyes='torch.nn.Conv2d(kwargs.input_shape[0],32,kernel_size=3)'"

Some blocks have default args which can be accessed with the kwargs. interpolation shown above.

An intricate example of the expressiveness of this syntax:

python Run.py Optim=SGD 'Pi_trunk="nn.Sequential(MLP(input_shape=kwargs.input_shape, output_shape=kwargs.output_shape),nn.ReLU(inplace=True))"' lr=0.01

Both the uppercase and lowercase syntax support direct function calls in place of usual syntax, with function calls distinguished by the syntactical quotes and parentheticals.

The parser automatically registers the imports/class paths in Utils. in both the uppercase and lowercase syntax, including modules/classes torch, torch.nn, and architectures/paths in ./Blocks/Architectures/ like CNN for direct access and no need to type Utils..

To make a custom architecture, you can use any Pytorch module which outputs a tensor. Woohoo, done.

To make it mix-and-matchable throughout UnfiedML for arbitrary dimensionalities and domains, to generalize as much as possible, you can add:

  1. input_shape and output_shape arguments to the __init__ method, such that your architecture can have a defined adaptation scheme for different possible shapes.
  2. Support arbitrary many inputs (such as by concatenating them) of weird shapes (broadcasting them).
  3. A repr_shape(*_) method that pre-computes the output shape given a varying-number of input shape dimensions as arguments.

None of these add-ons are necessary, but if you include all of them, then your architecture can adapt to everything. There are lazy ways to hack all of these features into any architecture, or you can follow the pretty basic templates used in our existing array of architectures. Most of our architectures can probably be used to build whatever architecture you’re trying to build, honestly, or at least something similar enough that you could have a good jumping-off point.

In short: To make your own architecture mix-and-matchable, just put it in a pytorch module with initialization options for input_shape and output_shape, as in the architectures in ./Blocks/Architectures.

The Encoder Eyes automatically adapt 2d conv to 1d conv by the way (if data is 1d).

Custom Optimizers

πŸ” Click to search/explore

You can pass in a path to the Optim= flag or select a built-in Pytorch optimizer like SGD, or both as below:

python Run.py Optim=Utils.torch.optim.SGD lr=0.1

Equivalently via the expressive recipe interface:

python Run.py Optim=SGD lr=0.1

or

python Run.py "optim='torch.optim.SGD(kwargs.params, lr=0.1)'"

In the first two examples, the lr= flag was optional. The default learning rate is 1e-4 and we could have writen +optim.lr=.

Per-block optimizers For example, just the Encoder:

python Run.py encoder.Optim=SGD

Learning rate schedulers. Scheduler= works analogously to Optim=, or just use the lr_decay_epochs= shorthand for cosine annealing e.g.

python Run.py task=classify/mnist lr_decay_epochs=100

Custom Env

πŸ” Click to let there be light

As an example of custom environments, we provide the Super Mario Bros. game environment in ./Datasets/Suites/SuperMario.py.

To use it, you can just pass in the path to Env= and specify the suite and the task_name to your choosing:

python Run.py Env=Datasets.Suites.SuperMario.SuperMario suite=SuperMario task_name=Mario


Mario trained via DQN.

Any Hyperparams you don't specify will be inherited from the default task, atari/pong in ./Hyperparams/task/atari/pong.yaml, or whichever task is selected.

β“˜ If you want to save Hyperparams and formally define a task, you can create files like ./Hyperparams/task/mario.yaml in the ./Hyperparams/task/ directory:

# ./Hyperparams/task/mario.yaml
defaults:
  - _self_

Env: Datasets.Suites.SuperMario.SuperMario
suite: SuperMario
task_name: Mario
discrete: true
action_repeat: 4
truncate_episode_steps: 250
nstep: 3
frame_stack: 4
train_steps: 3000000
stddev_schedule: 'linear(1.0,0.1,800000)'

Now you can launch Mario with:

python Run.py task=mario

You can also customize params and worlds and stages with the +env. syntax:

python Run.py task=mario +env.stage=2

Custom Dataset

πŸ” Click to read, parse, & boot up

You can pass in any Dataset as follows:

python Run.py task=classify/custom Dataset=torchvision.datasets.MNIST

That will launch MNIST. Another example, with a custom class and path,

python Run.py task=classify/custom Dataset=Datasets.Suites._TinyImageNet.TinyImageNet

This will initiate a classify task on the custom-defined TinyImageNet Dataset.

You can change the task name as it's saved for benchmarking and plotting, with task_name=. The default is the class name such as TinyImageNet.

UnifiedML is compatible with datasets & domains besides Vision.

Thanks to dimensionality adaptivity (slide 12) for example, train the default CNN architecture on raw 1D Audio:

python Run.py task=classify/custom Dataset=Datasets.Suites._SpeechCommands.SpeechCommands Aug=Identity

Gets a perfect score on speech command classification from raw 1D audio with the default CNN setting.

More details and examples πŸ“–

For a non-Vision/Audio tutorial, we provide a full end-to-end example in Crystal classification, reproducing classifying crystal structures and space groups from X-ray diffraction patterns.


Note: You can also specify an independent test dataset explicitly with TestDataset=.

Recipes

πŸ” Learn to cook

Save hyperparams to .yaml files by defining them in the ./Hyperparams/task/ directory. There are many saved examples already.

If you've defined a .yaml file called my_recipe.yaml for example, you can use it via

python Run.py task=my_recipe

Please share your recipes in our Discussions page if you discover new or better hyperparams for a problem.

Recipes can also be defined temporarily via command line without saving them to .yaml files.

Below is a running list of some out-of-the-ordinary or interesting ones:

python Run.py Eyes=Sequential +eyes._targets_="[CNN, Transformer]" task=classify/mnist
python Run.py task=classify/mnist Pool=Sequential +pool._targets_="[Transformer, AvgPool]" +pool.positional_encodings=false
python Run.py task=classify/mnist Pool=Residual +pool.model=Transformer +pool.depth=2
python Run.py task=classify/mnist Pool=Sequential +pool._targets_="[ChannelSwap, Residual]" +'pool.model="MLP(kwargs.input_shape[-1])"' +'pool.down_sample="MLP(input_shape=kwargs.input_shape[-1])"'
python Run.py task=classify/mnist Pool=RN
python Run.py task=classify/mnist Pool=Sequential +pool._targets_="[RN, AvgPool]"
python Run.py task=classify/mnist Eyes=Perceiver +eyes.depths="[3, 3, 2]"  +eyes.num_tokens=128
python Run.py task=classify/mnist Predictor=Perceiver +predictor.token_dim=32
python Run.py task=classify/mnist Predictor=Perceiver train_steps=2
python Run.py task=dmc/cheetah_run Predictor=load +predictor.path=./Checkpoints/Exp/DQNAgent/classify/MNIST_1.pt +predictor.attr=actor.Pi_head +predictor.device=cpu save=false
python Run.py task=classify/mnist Eyes=Identity Predictor=Perceiver +predictor.depths=10
python Run.py Aug=Sequential +aug._targets_="[IntensityAug, RandomShiftsAug]" +aug.scale=0.05 aug.pad=4

These are also useful for testing whether I've broken things.

Experiment naming, plotting

πŸ” Click to see

Plots automatically save to ./Benchmarking/<experiment>/; the default experiment is experiment=Exp.

python Run.py

πŸ“ˆ πŸ“Š --> ./Benchmarking/Exp/

Optionally plot multiple experiments

python Run.py experiment=Exp2 plotting.plot_experiments="['Exp', 'Exp2']"

Alternatively, you can call Plot.py directly

python Plot.py plot_experiments="['Exp', 'Exp2']"

to generate plots. Here, the <experiment> directory name will be the underscore_concatenated union of all experiment names ("Exp_Exp2").

Plotting also accepts regex expressions. For example, to plot all experiments with Exp in the name:

python Plot.py plot_experiments="['Exp.*']"

Another option is to use WandB, which is supported by UnifiedML:

python Run.py logger.wandb=true

You can connect UnifiedML to your WandB account by first running wandb login in your Conda environment.

To do a hyperparameter sweep, just use the -m flag.

python Run.py -m task=atari/pong,classify/mnist seed=1,2,3 

Log video during evaluations with log_media=true.

Publishing

πŸ” Click to write your own paper

We have released our slide deck!

Templates available here

Feel free to use our UnifiedML templates and figures in your work, citing us of course.

Open-source research for minimal redundancy and optimal standardization is the way to go, balancing privacy and de-centrality, and streamlining successive works that depend on ours in good faith. Post your own designs and assets here in the discussion board. Read the rules to keep citations and credit attribution fair.

πŸ“Š Agents & Performances

Atari

We can attain 100% mean human-normalized score across the Atari-26 benchmark suite in about 1m environment steps.

The below example script shows how to launch training for just Pong and Breakout with AC2Agent:

python Run.py task=atari/pong,atari/breakout -m

The results are reported for all 26 games and 3 different agents:

flowchart
Click here to see per-task results.
flowchart

We found these results to be pretty stable across a range of exploration rates as well:

flowchart

Each time point averages over 10 evaluation episodes (and 26 games).

DCGAN

The simplest way to do DCGAN is to use the DCGAN architecture:

python Run.py task=classify/celeba generate=true Discriminator=DCGAN.Discriminator Generator=DCGAN.Generator train_steps=50000
flowchart

We can then improve the results, and speed up training tenfold, by modifying the hyperparameters:

python Run.py task=classify/celeba generate=true Discriminator=DCGAN.Discriminator Generator=DCGAN.Generator z_dim=100 Aug=Identity Optim=Adam '+optim.betas=[0.5, 0.999]' lr=2e-4 +agent.num_critics=1 train_steps=5000
flowchart

⁉️ How is this possible

We use our new Creator framework to unify RL discrete and continuous action spaces, as elaborated in our paper.

Then we frame actions as "predictions" in supervised learning. We can even augment supervised learning with an RL phase, treating reward as negative error.

For generative modeling, well, it turns out that the difference between a Generator-Discriminator and Actor-Critic is rather nominal.

flowchart

πŸŽ“ Pedagogy and Research

All files are designed for pedagogical clarity and extendability for research, to be useful for educational and innovational purposes, with simplicity at heart.

πŸ§‘β€πŸ€β€πŸ§‘ Contributing

Please support financially by Sponsoring.

We are a nonprofit, single-PhD student team. If possible, compute resources appreciated.

Feel free to contact agi.__init__.

I am always looking for collaborators. Don't hesitate to volunteer in any way to help realize the full potential of this library.


MIT license Included.

Non-legacy version: here.

About

Unified library for intelligence training. 🌱

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Languages