-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does Stoix support action masking for unused actions when creating the environment itself? #132
Comments
Hello, So currently action masking is not properly supported. There is the basic infrastructure to implement it as you've seen. It's just a dummy variable for now but to implement it would be easy. You would simply need to do two things:
Let me know if you need any help with this or have any further questions. |
Thanks for your response @EdanToledo. After reviewing the Navix source code, I noticed that they do not provide a convenience function to retrieve the legal actions on a per-environment basis. By default, it seems they use the MiniGrid action set (rotate_ccw, rotate_cw, forward, pickup, drop, toggle, done) for all environments. As a result, the only way to implement this functionality in a wrapper is to hardcode it in some form. As a workaround, I found that I can add custom actions while creating the environment itself by using the additional action_set argument in the navix.make function (in make_env.py). For example, I can specify an action set like (navix.actions.rotate_ccw, navix.actions.rotate_cw, navix.actions.forward). Furthermore, I could modify the make_navix_env function to read the legal actions from a configuration file. This adjusts the action space of the created environment, and I believe it eliminates the need to create any new action heads. Please let me know if my understanding is incorrect. |
I'm not too familiar with navix practically but that sounds easier than needing to explicitly do action masking in the network itself. |
I am trying to run DDQN on Navix Four Rooms, and some actions are not used. The default action space has 7 actions(left,right,forward,pickup,drop,toggle,done). The last 4 actions are irrelavant for this environment. When i inspect the returned timestep from env.reset(), the observation object has attributes agent_view, action_mask,step_count. The action_mask has all ones.. looks like it is just a dummy variable and is not used anywhere in the code
Can i selectively modify the action space when creating the enviornment itself? is it already implemented or do I need to write my own code for this?
The text was updated successfully, but these errors were encountered: