-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix bugs in MultiAgentEpisode
class (getters, env_t_to_agent_t
system, etc..).
#49191
[RLlib] Fix bugs in MultiAgentEpisode
class (getters, env_t_to_agent_t
system, etc..).
#49191
Conversation
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some small nits in regard to commenting.
!= self.SKIP_ENV_TS_TAG | ||
): | ||
len_lookback_buffer = max(len_lookback_buffer, i) | ||
for i in range(orig_len_lb, len(self.env_t_to_agent_t[agent_id].data) + 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we access here directly the raw data
attribute when we have the InfiniteLookbackBuffer
defined as a class with certain length methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question and yes, this is a little hacky here. The len method returns the length but w/o(!) the lookback buffer.
@@ -2325,6 +2293,8 @@ def _get_data_by_env_steps_as_list( | |||
hanging_val, | |||
filter_for_skip_indices=idxes[i], | |||
) | |||
if what == "extra_model_outputs" and not inf_lookback_buffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess inf_lookback_buffer
is a bool. Imo the code becomes better readable, if we change the return values of _get_inf_lookback_buffer_or_dict
to a indices_to_use
only and instead check here for the type.
return { | ||
key: sub_buffer.get( | ||
indices=index_incl_lookback - sub_buffer.lookback, | ||
neg_index_as_lookback=True, | ||
fill=fill, | ||
_add_last_ts_value=hanging_val, | ||
_add_last_ts_value=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky. How did you find it?
{"a2": 3}, | ||
{"a2": 4}, | ||
|
||
# <- BUT: actual cut here, b/c of hanging action of a2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify here why this action is hanging? And means "cut" MAE.cut
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done.
if _add_last_ts_value is not None: | ||
data_to_use = np.append(data_to_use.copy(), _add_last_ts_value) | ||
if self.finalized: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I remember this location to be part of the error output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks for confirming! This was a pretty common bug you would run into when your multi-agent episode is a little bit more unstructured wrt agents stepping patterns.
Signed-off-by: sven1977 <[email protected]>
…nt_t` system, etc..). (ray-project#49191)
…nt_t` system, etc..). (ray-project#49191) Signed-off-by: ujjawal-khare <[email protected]>
Fix bugs in
MultiAgentEpisode
class:env_t_to_agent_t
system.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.