Does Stoix support action masking for unused actions when creating the environment itself? #132

veerendrav · 2025-01-26T07:46:26Z

I am trying to run DDQN on Navix Four Rooms, and some actions are not used. The default action space has 7 actions(left,right,forward,pickup,drop,toggle,done). The last 4 actions are irrelavant for this environment. When i inspect the returned timestep from env.reset(), the observation object has attributes agent_view, action_mask,step_count. The action_mask has all ones.. looks like it is just a dummy variable and is not used anywhere in the code

Can i selectively modify the action space when creating the enviornment itself? is it already implemented or do I need to write my own code for this?

EdanToledo · 2025-01-26T10:31:18Z

Hello,

So currently action masking is not properly supported. There is the basic infrastructure to implement it as you've seen. It's just a dummy variable for now but to implement it would be easy. You would simply need to do two things:

Actually implement the action masking in the wrapper so not only dummy ones are returned but a real action mask.
Create a new action head (and possibly torso) where the action mask is passed in (in addition to the processed observation embeddings) and does the masking for the policy distribution.

Let me know if you need any help with this or have any further questions.

veerendrav · 2025-01-26T17:24:37Z

Thanks for your response @EdanToledo.

After reviewing the Navix source code, I noticed that they do not provide a convenience function to retrieve the legal actions on a per-environment basis. By default, it seems they use the MiniGrid action set (rotate_ccw, rotate_cw, forward, pickup, drop, toggle, done) for all environments. As a result, the only way to implement this functionality in a wrapper is to hardcode it in some form.

As a workaround, I found that I can add custom actions while creating the environment itself by using the additional action_set argument in the navix.make function (in make_env.py). For example, I can specify an action set like (navix.actions.rotate_ccw, navix.actions.rotate_cw, navix.actions.forward). Furthermore, I could modify the make_navix_env function to read the legal actions from a configuration file. This adjusts the action space of the created environment, and I believe it eliminates the need to create any new action heads.

Please let me know if my understanding is incorrect.

EdanToledo · 2025-01-26T18:29:16Z

I'm not too familiar with navix practically but that sounds easier than needing to explicitly do action masking in the network itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Stoix support action masking for unused actions when creating the environment itself? #132

Does Stoix support action masking for unused actions when creating the environment itself? #132

veerendrav commented Jan 26, 2025

EdanToledo commented Jan 26, 2025

veerendrav commented Jan 26, 2025 •

edited

Loading

EdanToledo commented Jan 26, 2025

Does Stoix support action masking for unused actions when creating the environment itself? #132

Does Stoix support action masking for unused actions when creating the environment itself? #132

Comments

veerendrav commented Jan 26, 2025

EdanToledo commented Jan 26, 2025

veerendrav commented Jan 26, 2025 • edited Loading

EdanToledo commented Jan 26, 2025

veerendrav commented Jan 26, 2025 •

edited

Loading