Release v0.0.5, new examples, many bug fixes. · zafarali/emdp

Add the Cakeworld MDP from the action gap paper: Bellemare et al. 2015 #16
Add the two circle MDP for Off policy from Zhang et al 2019 #16
Change plotting functions so that the MDP plot in the same orientation as in the character matrix file #17
Fix Transition matrix builder where if the agent was manually placed in a wall state there was a non-zero probability of it escaping. This was fixed in #17 along with associated tests. Walls now behave as absorbing states. As a side effect you should also see that value functions for wall states are zero.
Fix a really nasty bug in utils.convert_one_hot_to_int that caused the downcasting of an action. This function now returns a normal int as opposed to a np.int8: 59c2064
All tests are now passing again.

Provide feedback