Skip to content

Commit

Permalink
alloew for a narrow sigma
Browse files Browse the repository at this point in the history
when the model figure out that the action, they is not point on keeping exploring
therefore, teh deviation should be really narrow.
Clamping sigma to a high min value can cause the policy to collapse after is figure that is is a good policy.
at lest that's the intuition
  • Loading branch information
JulioJerez committed Sep 16, 2024
1 parent 486df49 commit 0657a68
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,7 @@ void ndBrainAgentContinuePolicyGradient_TrainerMaster::OptimizePolicy()
{
const ndBrainFloat mean = output[i];
ndAssert(ndExp(output[i + numberOfActions]) > 0.0f);
const ndBrainFloat sigma1 = ndMax (ndExp(output[i + numberOfActions]), ndFloat32(1.0e-2f));
const ndBrainFloat sigma1 = ndMax (ndExp(output[i + numberOfActions]), ndFloat32(1.0e-4f));
const ndBrainFloat sigma2 = sigma1 * sigma1;
const ndBrainFloat sigma3 = sigma2 * sigma1;
const ndBrainFloat num = (actions[i] - mean);
Expand Down

0 comments on commit 0657a68

Please sign in to comment.