SWt and synaptic scaling issues #44

rcoreilly · 2021-05-25T07:55:40Z

rcoreilly
May 25, 2021
Maintainer

The new SWt mechanism simulates the structural, slowly-adapting spine-level plasticity "outer loop" to faster AMPA-based plasticity.

Wt = SWt * Sig(LWt) -- SWt is a multiplicative factor
Randomly initializing SWt (to some extent) causes each neuron to have a distinct bias in what to learn about: a continuous version of random connectivity. This can in principle prevent all neurons from chasing after the same ball, like toddlers playing soccer (an collective form of hogging), which is measured with PCA eigenvalues.
SWt slowly adapts based on current learned Wts, updating the "prior" bias, subject to constraints (mean, min, max limits) that encourage good behavior and prevent hogging.
Overall synaptic scaling to maintain TrgAvg target average activity level happens on the fast (AMPA) learned weights -- not on the SWts! Didn't work well to have these bigger changes in SWt means to try to enforce, and also syn scale needs to be faster than SWt, and also independent of SWt in cases where SWt is not being used.

rcoreilly · 2021-05-25T08:12:45Z

rcoreilly
May 25, 2021
Maintainer Author

The constrained Mean of SWt, which is same as initial random weights, significantly affects the initial PCA NStrong and overall learning performance. In general, mean = .5 learns significantly better than .4.

V2: .5 = lower PCA that generally then drops over time, .4 = higher PCA, stays flat or rises. ActAvg is same. With .5 eventually PCA and activity get so low that it seems to be cause of blowup. V2m8 is particularly bad for loss of PCA and Act.
V4: less diff overall in PCA and trajectory, and now .4 has lower at start than .5, both generally rise. MaxGeM and act also rise, sometimes too high in .4 case.
TEO: even less diff than V4 in PCA, all rise.

TODO: try different params for V2 vs. other prjns. Also see next for interaction with SWt adapting, SynScale.

0 replies

rcoreilly · 2021-05-25T08:15:39Z

rcoreilly
May 25, 2021
Maintainer Author

When scaling LWt to match TrgAvg, we have a "credit assignment" problem -- just change all weights the same amount, or differentially affect the stronger weights?

If using Wt credit assignment, we get a strong positive feedback loop -- causes some weights to rise more, can drive MaxGeM to go out of range with mean = .4..

SWt is probably best credit assignment factor -- does something but not too much. None can cause too little differentiation in weights and impair PCA over time.

0 replies

rcoreilly · 2021-05-25T08:20:16Z

rcoreilly
May 25, 2021
Maintainer Author

Major issue: how to update SWt -- this interacts with mean and SynScaling in ways I'm just now realizing.

Have been doing: SWt += lrate * (Wt - SWt)
but Wt includes SWt already, and also includes SynScaling! this can easily drown out learning signals -- Wt is contrast enhanced and SWt can never really match it given its own constraints on mean / min / max. not a good training signal.
Instead: separately aggregate DWt and use that? Sum? minibatch just uses sum. simplest.

0 replies

rcoreilly · 2021-05-25T08:49:05Z

rcoreilly
May 25, 2021
Maintainer Author

Particular challenges here are that just about anything works fine for ra25 and objrec, so only shows up in longer-time behavior in lvis.. long debug loop. Also, started out doing syn scale in SWt so didn't realize how that was being contaminated.. now at least have some key steps and sense of space of issues...

0 replies

rcoreilly · 2021-05-31T06:54:58Z

rcoreilly
May 31, 2021
Maintainer Author

In very well-performing run 459 with V1 shortcut cons to all, turning off SWt has nearly .1 cosdiff diff impairment, but err score is barely different. All the internal health measures of PCA etc are much better in the version with SWt. but they're not showing up in final performance. maybe need a better decoder..

Also the SWt lrate makes only tiny diffs from .1 to .001! goes in direction of being less constrained with .1 but still, diffs are remarkably tiny. This suggests that aggregate SDWt is likely to be very small overall?

0 replies

rcoreilly · 2021-06-15T17:53:46Z

rcoreilly
Jun 15, 2021
Maintainer Author

For LVis on 100 objects, running for 2000 epochs, finally seeing differences between SWt learning rates: .001 > .01 > .1 in terms of preservation of PCA structure -- also shows up to a relatively smaller extent in the overall performance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWt and synaptic scaling issues #44

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

SWt and synaptic scaling issues #44

rcoreilly May 25, 2021 Maintainer

Replies: 6 comments

rcoreilly May 25, 2021 Maintainer Author

rcoreilly May 25, 2021 Maintainer Author

rcoreilly May 25, 2021 Maintainer Author

rcoreilly May 25, 2021 Maintainer Author

rcoreilly May 31, 2021 Maintainer Author

rcoreilly Jun 15, 2021 Maintainer Author

rcoreilly
May 25, 2021
Maintainer

rcoreilly
May 25, 2021
Maintainer Author

rcoreilly
May 25, 2021
Maintainer Author

rcoreilly
May 25, 2021
Maintainer Author

rcoreilly
May 25, 2021
Maintainer Author

rcoreilly
May 31, 2021
Maintainer Author

rcoreilly
Jun 15, 2021
Maintainer Author