Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Cleanup examples folder (new API stack) vol 31: Add hierarchical training example script. #49127

Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
10718e7
wip
sven1977 Nov 4, 2024
f6caa54
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Nov 5, 2024
45d16fa
wip
sven1977 Nov 5, 2024
e02f5ad
wip
sven1977 Nov 5, 2024
2e507ec
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Nov 7, 2024
4c04b04
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Nov 11, 2024
36fb8d4
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Nov 17, 2024
d3da672
wip
sven1977 Nov 19, 2024
b8d502f
wip
sven1977 Nov 20, 2024
542d22a
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Dec 3, 2024
724d350
wip
sven1977 Dec 4, 2024
408f633
wip
sven1977 Dec 4, 2024
a632872
wip
sven1977 Dec 6, 2024
6185746
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Dec 6, 2024
1a99237
wip
sven1977 Dec 6, 2024
c42b435
wip
sven1977 Dec 6, 2024
36bea65
wip
sven1977 Dec 6, 2024
466f53d
wip
sven1977 Dec 6, 2024
dd40979
wip
sven1977 Dec 10, 2024
9f3e607
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Dec 10, 2024
1aa4ab0
wip
sven1977 Dec 10, 2024
9f8c33a
wip
sven1977 Dec 10, 2024
874183c
Merge branch 'fix_further_bugs_in_multi_agent_episode' into cleanup_e…
sven1977 Dec 10, 2024
240720a
wip
sven1977 Dec 10, 2024
6715808
LINT
sven1977 Dec 10, 2024
d289127
running fine w/o crashes
sven1977 Dec 11, 2024
281c30a
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Dec 11, 2024
98c5e4f
wip
sven1977 Dec 11, 2024
5e71845
wip
sven1977 Dec 11, 2024
8d46aec
wip
sven1977 Dec 11, 2024
72bec12
wip
sven1977 Dec 11, 2024
a1369a7
wip
sven1977 Dec 11, 2024
78b36fc
wip
sven1977 Dec 11, 2024
81f7a57
wip
sven1977 Dec 11, 2024
cd61bb3
wip
sven1977 Dec 11, 2024
5209af7
wip
sven1977 Dec 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/source/rllib/rllib-examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,16 @@ GPU (for Training and Sampling)
with performance improvements during evaluation.


Hierarchical Training
+++++++++++++++++++++

- `Hierarchical RL Training <https://github.com/ray-project/ray/blob/master/rllib/examples/hierarchical/hierarchical_training.py>`__:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!!

Showcases a hierarchical RL setup inspired by automatic subgoal discovery and subpolicy specialization. A high-level policy selects subgoals and assigns one of three
specialized low-level policies to achieve them within a time limit, encouraging specialization and efficient task-solving.
The agent has to navigate a complex grid-world environment. The example highlights the advantages of hierarchical
learning over flat approaches by demonstrating significantly improved learning performance in challenging, goal-oriented tasks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just really interesting



Inference (of Models/Policies)
++++++++++++++++++++++++++++++

Expand Down
17 changes: 3 additions & 14 deletions rllib/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -2590,24 +2590,13 @@ py_test(

# subdirectory: hierarchical/
# ....................................
#@OldAPIStack
py_test(
name = "examples/hierarchical/hierarchical_training_tf",
name = "examples/hierarchical/hierarchical_training",
main = "examples/hierarchical/hierarchical_training.py",
tags = ["team:rllib", "exclusive", "examples"],
size = "medium",
srcs = ["examples/hierarchical/hierarchical_training.py"],
args = [ "--framework=tf", "--stop-reward=0.0"]
)

#@OldAPIStack
py_test(
name = "examples/hierarchical/hierarchical_training_torch",
main = "examples/hierarchical/hierarchical_training.py",
tags = ["team:rllib", "exclusive", "examples"],
size = "medium",
size = "large",
srcs = ["examples/hierarchical/hierarchical_training.py"],
args = ["--framework=torch", "--stop-reward=0.0"]
args = ["--enable-new-api-stack", "--as-test", "--stop-reward=4.0", "--map=large", "--time-limit=50"]
)

# subdirectory: inference/
Expand Down
2 changes: 1 addition & 1 deletion rllib/env/multi_agent_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ class MultiAgentEnv(gym.Env):
# This attribute should not be changed during the lifetime of this env.
possible_agents: List[AgentID] = []

# @OldAPIStack
# @OldAPIStack, use `observation_spaces` and `action_spaces`, instead.
observation_space: Optional[gym.Space] = None
action_space: Optional[gym.Space] = None

Expand Down
17 changes: 10 additions & 7 deletions rllib/env/multi_agent_episode.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ def add_env_step(
action_space=self.action_space.get(agent_id),
)
else:
sa_episode = self.agent_episodes.get(agent_id)
sa_episode = self.agent_episodes[agent_id]

# Collect value to be passed (at end of for-loop) into `add_env_step()`
# call.
Expand Down Expand Up @@ -551,8 +551,8 @@ def add_env_step(
# duplicate the previous one (this is a technical "fix" to properly
# complete the single agent episode; this last observation is never
# used for learning anyway).
_observation = sa_episode.get_observations(-1)
_infos = sa_episode.get_infos(-1)
_observation = sa_episode._last_added_observation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

_infos = sa_episode._last_added_infos
# Agent is still alive.
# [previous obs] [action] (hanging) ...
else:
Expand Down Expand Up @@ -595,8 +595,8 @@ def add_env_step(
# duplicate the previous one (this is a technical "fix" to properly
# complete the single agent episode; this last observation is never
# used for learning anyway).
_observation = sa_episode.get_observations(-1)
_infos = sa_episode.get_infos(-1)
_observation = sa_episode._last_added_observation
_infos = sa_episode._last_added_infos
# `_action` is already `get` above. We don't need to pop out from
# the cache as it gets wiped out anyway below b/c the agent is
# done.
Expand Down Expand Up @@ -1770,7 +1770,7 @@ def get_state(self) -> Dict[str, Any]:
# TODO (simon): Check, if we can store the `InfiniteLookbackBuffer`
"env_t_to_agent_t": self.env_t_to_agent_t,
"_hanging_actions_end": self._hanging_actions_end,
"_hanging_extra_model_outputs_end": (self._hanging_extra_model_outputs_end),
"_hanging_extra_model_outputs_end": self._hanging_extra_model_outputs_end,
"_hanging_rewards_end": self._hanging_rewards_end,
"_hanging_actions_begin": self._hanging_actions_begin,
"_hanging_extra_model_outputs_begin": (
Expand Down Expand Up @@ -2532,12 +2532,15 @@ def _get_single_agent_data_by_index(
# buffer, but a dict mapping keys to individual infinite lookback
# buffers.
if extra_model_outputs_key is None:
assert hanging_val is None or isinstance(hanging_val, dict)
return {
key: sub_buffer.get(
indices=index_incl_lookback - sub_buffer.lookback,
neg_index_as_lookback=True,
fill=fill,
_add_last_ts_value=hanging_val,
_add_last_ts_value=(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another bug fix

None if hanging_val is None else hanging_val[key]
),
**one_hot_discrete,
)
for key, sub_buffer in inf_lookback_buffer.items()
Expand Down
11 changes: 11 additions & 0 deletions rllib/env/single_agent_episode.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ class SingleAgentEpisode:
"t",
"t_started",
"_action_space",
"_last_added_observation",
"_last_added_infos",
"_last_step_time",
"_observation_space",
"_start_time",
Expand Down Expand Up @@ -346,6 +348,9 @@ def __init__(
self._start_time = None
self._last_step_time = None

self._last_added_observation = None
self._last_added_infos = None

# Validate the episode data thus far.
self.validate()

Expand Down Expand Up @@ -380,6 +385,9 @@ def add_env_reset(
self.observations.append(observation)
self.infos.append(infos)

self._last_added_observation = observation
self._last_added_infos = infos

# Validate our data.
self.validate()

Expand Down Expand Up @@ -434,6 +442,9 @@ def add_env_step(
self.is_terminated = terminated
self.is_truncated = truncated

self._last_added_observation = observation
self._last_added_infos = infos

# Only check spaces if finalized AND every n timesteps.
if self.is_finalized and self.t % 50:
if self.observation_space is not None:
Expand Down
13 changes: 11 additions & 2 deletions rllib/env/utils/infinite_lookback_buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -533,9 +533,18 @@ def _get_int_index(
):
data_to_use = self.data
if _ignore_last_ts:
data_to_use = self.data[:-1]
if self.finalized:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were all bugs

data_to_use = tree.map_structure(lambda s: s[:-1], self.data)
else:
data_to_use = self.data[:-1]
if _add_last_ts_value is not None:
data_to_use = np.append(data_to_use.copy(), _add_last_ts_value)
if self.finalized:
data_to_use = tree.map_structure(
lambda s, last: np.append(s, last), data_to_use, _add_last_ts_value
) # np.append(data_to_use.copy(), _add_last_ts_value)
else:
data_to_use = data_to_use.copy()
data_to_use.append(_add_last_ts_value)

# If index >= 0 -> Ignore lookback buffer.
# Otherwise, include lookback buffer.
Expand Down
Loading
Loading