You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the def train_adv_actor_critic function train.py file, when you randomly sample and train the actor with these samples (generated by different policies) isn't that essentially off policy? Whereas A2C is on policy algorithm.
Can you please clarify this and correct me if my understanding is wrong.
The text was updated successfully, but these errors were encountered:
Hi,
In the def train_adv_actor_critic function train.py file, when you randomly sample and train the actor with these samples (generated by different policies) isn't that essentially off policy? Whereas A2C is on policy algorithm.
Can you please clarify this and correct me if my understanding is wrong.
The text was updated successfully, but these errors were encountered: