Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Stoix support action masking for unused actions when creating the environment itself? #132

Open
veerendrav opened this issue Jan 26, 2025 · 3 comments

Comments

@veerendrav
Copy link

I am trying to run DDQN on Navix Four Rooms, and some actions are not used. The default action space has 7 actions(left,right,forward,pickup,drop,toggle,done). The last 4 actions are irrelavant for this environment. When i inspect the returned timestep from env.reset(), the observation object has attributes agent_view, action_mask,step_count. The action_mask has all ones.. looks like it is just a dummy variable and is not used anywhere in the code

Can i selectively modify the action space when creating the enviornment itself? is it already implemented or do I need to write my own code for this?

@EdanToledo
Copy link
Owner

Hello,

So currently action masking is not properly supported. There is the basic infrastructure to implement it as you've seen. It's just a dummy variable for now but to implement it would be easy. You would simply need to do two things:

  1. Actually implement the action masking in the wrapper so not only dummy ones are returned but a real action mask.
  2. Create a new action head (and possibly torso) where the action mask is passed in (in addition to the processed observation embeddings) and does the masking for the policy distribution.

Let me know if you need any help with this or have any further questions.

@veerendrav
Copy link
Author

veerendrav commented Jan 26, 2025

Thanks for your response @EdanToledo.

After reviewing the Navix source code, I noticed that they do not provide a convenience function to retrieve the legal actions on a per-environment basis. By default, it seems they use the MiniGrid action set (rotate_ccw, rotate_cw, forward, pickup, drop, toggle, done) for all environments. As a result, the only way to implement this functionality in a wrapper is to hardcode it in some form.

As a workaround, I found that I can add custom actions while creating the environment itself by using the additional action_set argument in the navix.make function (in make_env.py). For example, I can specify an action set like (navix.actions.rotate_ccw, navix.actions.rotate_cw, navix.actions.forward). Furthermore, I could modify the make_navix_env function to read the legal actions from a configuration file. This adjusts the action space of the created environment, and I believe it eliminates the need to create any new action heads.

Please let me know if my understanding is incorrect.

@EdanToledo
Copy link
Owner

I'm not too familiar with navix practically but that sounds easier than needing to explicitly do action masking in the network itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants