[RLlib] Examples folder cleanup vol. 45: Enhance/redo autoregressive action distribution example. #49967

sven1977 · 2025-01-20T13:04:30Z

Examples folder cleanup vol. 45: Enhance/redo autoregressive action distribution example.

Slightly changed the correlated-actions-env to make sure a1 is dependent on obs and a2 is dependent on both obs AND a1.
Simplified example RLModule by removing all catalog dependencies and the encoder.
Fixed bug in example RLModule: discrete a1 was not one-hot'd, which was probably the reason why learning was not working or very slow.
This PR is also in preparation of another upcoming doc PR that explains the various action distribution types available to users.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…_redo_remove_rllib_models_rst Signed-off-by: sven1977 <[email protected]> # Conflicts: # doc/source/rllib/rllib-rlmodule.rst

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2025-01-20T15:59:42Z

rllib/examples/autoregressive_action_dist.py

@@ -1,223 +0,0 @@
-# @OldAPIStack


old API stack -> remove

simonsays1980

LGTM. Some general questions in regard to the simplification of the RLModule examples. Otherwise small nits.

simonsays1980 · 2025-01-21T09:10:21Z

rllib/examples/envs/classes/correlated_actions_env.py

+    obs + a1. For example, if obs is -0.3 and a1 was sampled to be 1, then the value of
+    the first reward component is:
+    r1 = -abs(1.0 - [obs+a1]) = -abs(1.0 - (-0.3 + 1)) = -abs(0.3) = -0.3
+    The second reward component is computed as the negative absolute value


Maybe better to keep the same values as used for the first reward component.

simonsays1980 · 2025-01-21T09:13:56Z

rllib/examples/envs/classes/correlated_actions_env.py

-        desired_a2 = (
-            self.state[0] * a1
-        )  # Autoregressive relationship dependent on state
+        # r1 depends on how well q1 is aligned with obs:


q1 -> a1 ;)

simonsays1980 · 2025-01-21T09:26:13Z

rllib/examples/rl_modules/classes/autoregressive_actions_rlm.py

+
+    @override(TorchRLModule)
+    def get_inference_action_dist_cls(self):
+        return TorchMultiDistribution.get_partial_dist_cls(


This is a nice example of how to use our MultiDistribution classes. I wonder, however, why we do not use it above where the distributions are used for sampling? In training I guess exactly this distribution is used for the action logps, isn't it?

It's still a very messy API:

from_logits requires a single tensor, which then gets split into n sub-tensors through the help of another method 🤔

ctor however utilizes a nested structure of subcomponents, which should probably be the norm. I'm wondering, whether we can simplify the entire Distributions API further by getting rid of from_logits altogether.

Another, even more radical thought would be to even get rid of Distribution/TorchDistribution and use torch.Distribution instead 😉.

But we should clean this up separately.

simonsays1980 · 2025-01-21T09:35:56Z

rllib/examples/rl_modules/classes/autoregressive_actions_rlm.py

+        assert self.action_space[0].n == 3
+        assert isinstance(self.action_space[1], gym.spaces.Box)
+
+        self._prior_net = nn.Sequential(


This new definition of the autoregressive module simplifies the setup and is better readable, I admit. However, it completely blanks out how users could/should use the RLModule attributes as well as the Catalog that makes setting up networks simply by pre-configured hyparparameters. Given that most current examples define the RLModule directly without leveraging its specific attributes, methods, or the Catalog - and recognizing that this often presents a challenge - we should evaluate our approach. Specifically, we should consider whether to (a) provide these same examples also in a form that explicitly demonstrate the use of these class attributes, methods, and the catalogue, or (b) streamline the RLModule by simplifying it further and potentially removing the Catalog entirely.

By adding too many APIs to this example (Catalog) that have nothing to do with the actual problem to be solved here (autoregressive sampling of the 2 action components), we are cluttering this example.

I agree, we need to think about the usefulnes of Catalog in general. My take has been for a while now to get rid of it and simply replace it with utility functions that the user may use, but doesn't have to. Most users don't have the problem of 1 Model -> 100 spaces (like RLlib does!), but have a relatively fixed space and thus also a relatively stable model encoder architecture that doesn't require a complex "catalog decision tree".

simonsays1980 · 2025-01-21T09:37:02Z

rllib/examples/envs/classes/correlated_actions_env.py

+    and a2 was sampled to be -0.7, then the value of the second reward component is:
+    r2 = -abs(obs + a1 + a2) = -abs(0.5 + 0 - 0.7)) = -abs(-0.2) = -0.2
+
+    Because of this specific reward function, the agent must learn to optimally sample


Maybe describe in a sentence why this specific reward design supports the agent in learning correlated actions?

simonsays1980 · 2025-01-21T09:38:22Z

rllib/examples/rl_modules/classes/autoregressive_actions_rlm.py

+
+        # Posterior forward pass.
+        posterior_batch = torch.cat(
+            [obs, one_hot(a1, self.action_space[0])],


Interesting. I tried in multiple examples both, one-hot and "raw" actions and could not see a difference. HOw many iterations does it need now to converge?

Yeah, I think the existing example was not learning or just super slowly. The new example definitely learns both action components' dependency on 1) obs and 2) obs+[first action].

Signed-off-by: sven1977 <[email protected]>

…action distribution example. (ray-project#49967)

…action distribution example. (ray-project#49967) Signed-off-by: Anson Qian <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

2337de6

…_redo_remove_rllib_models_rst Signed-off-by: sven1977 <[email protected]> # Conflicts: # doc/source/rllib/rllib-rlmodule.rst

sven1977 requested review from maxpumperla, simonsays1980 and a team as code owners January 20, 2025 13:04

sven1977 assigned simonsays1980 Jan 20, 2025

fix

ea4973a

Signed-off-by: sven1977 <[email protected]>

sven1977 mentioned this pull request Jan 20, 2025

[RLlib] Docs do-over (new API stack): Remove rllib-models.rst. #49966

Merged

8 tasks

sven1977 commented Jan 20, 2025

View reviewed changes

rllib/examples/autoregressive_action_dist.py

@@ -1,223 +0,0 @@

# @OldAPIStack

Copy link

Contributor Author

sven1977 Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old API stack -> remove

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 21, 2025

simonsays1980 approved these changes Jan 21, 2025

View reviewed changes

wip

1c5eabf

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) January 21, 2025 10:14

github-actions bot disabled auto-merge January 21, 2025 10:14

github-actions bot added the go add ONLY when ready to merge, run all tests label Jan 21, 2025

sven1977 enabled auto-merge (squash) January 21, 2025 11:15

fixes

5015c48

Signed-off-by: sven1977 <[email protected]>

github-actions bot disabled auto-merge January 21, 2025 11:54

sven1977 enabled auto-merge (squash) January 21, 2025 13:08

sven1977 merged commit a918028 into ray-project:master Jan 21, 2025
6 checks passed

sven1977 deleted the cleanup_examples_folder_45_autoregressive_actions branch January 21, 2025 13:26

simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Jan 23, 2025

[RLlib] Examples folder cleanup vol. 45: Enhance/redo autoregressive …

c9be6ed

…action distribution example. (ray-project#49967)

anson627 pushed a commit to anson627/ray that referenced this pull request Jan 31, 2025

[RLlib] Examples folder cleanup vol. 45: Enhance/redo autoregressive …

5082d39

…action distribution example. (ray-project#49967) Signed-off-by: Anson Qian <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Examples folder cleanup vol. 45: Enhance/redo autoregressive action distribution example. #49967

[RLlib] Examples folder cleanup vol. 45: Enhance/redo autoregressive action distribution example. #49967

sven1977 commented Jan 20, 2025 •

edited

Loading

sven1977 Jan 20, 2025

simonsays1980 left a comment

simonsays1980 Jan 21, 2025

simonsays1980 Jan 21, 2025

sven1977 Jan 21, 2025

simonsays1980 Jan 21, 2025

sven1977 Jan 21, 2025

simonsays1980 Jan 21, 2025

sven1977 Jan 21, 2025

simonsays1980 Jan 21, 2025

simonsays1980 Jan 21, 2025

sven1977 Jan 21, 2025

[RLlib] Examples folder cleanup vol. 45: Enhance/redo autoregressive action distribution example. #49967

[RLlib] Examples folder cleanup vol. 45: Enhance/redo autoregressive action distribution example. #49967

Conversation

sven1977 commented Jan 20, 2025 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Jan 20, 2025 •

edited

Loading