You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I understand correctly, the code in tensorflow2/actor_critic.py implements the One-step Actor-Critic (episodic) algorithm given on page 332 of RLbook2020 by Sutton/barto (picture given below).
Here we can see that the critic parameters w are updated only using the gradient of the value function for the current state S
which is represented as grad(V(S, w)) in the pseudocode shown above. The update skips the gradient of the value function for the next state S'. This can again be seen in the pseudocode above, there is nograd(V(S', w))present in the update rule for critic parameters w.
In the code given below, including state_value_, _ = self.actor_critic(state_) (L43) inside the GradientTape would result in grad(V(S', w)) appearing in the update for w, which contradicts the pseudocode shown above.
If I understand correctly, the code in tensorflow2/actor_critic.py implements the
One-step Actor-Critic (episodic)
algorithm given on page 332 of RLbook2020 by Sutton/barto (picture given below).Here we can see that the critic parameters w are updated only using the gradient of the value function for the current state S
which is represented as
grad(V(S, w))
in the pseudocode shown above. The update skips the gradient of the value function for the next state S'. This can again be seen in the pseudocode above, there is nograd(V(S', w))
present in the update rule for critic parameters w.In the code given below, including
state_value_, _ = self.actor_critic(state_)
(L43) inside theGradientTape
would result ingrad(V(S', w))
appearing in the update for w, which contradicts the pseudocode shown above.Youtube-Code-Repository/ReinforcementLearning/PolicyGradient/actor_critic/tensorflow2/actor_critic.py
Lines 40 to 45 in 1ef7605
Please let me know if there are some gaps in my understanding!
The text was updated successfully, but these errors were encountered: