train_adv_actor_critic logic #3

qwedaq · 2021-07-04T01:17:04Z

Hi,

In the def train_adv_actor_critic function train.py file, when you randomly sample and train the actor with these samples (generated by different policies) isn't that essentially off policy? Whereas A2C is on policy algorithm.
Can you please clarify this and correct me if my understanding is wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_adv_actor_critic logic #3

train_adv_actor_critic logic #3

qwedaq commented Jul 4, 2021

train_adv_actor_critic logic #3

train_adv_actor_critic logic #3

Comments

qwedaq commented Jul 4, 2021