Rubicon Choose Results #328
Replies: 4 comments
-
June, 2024All of the runs have 6 arms, each with a unique CS, organized into 2 groups with the same 3 US outcomes.
The comparison condition is N=25 repetitions, run in parallel, to provide reliable results. Bad US probability = 0.8
Behavior: Probability of selecting the lower-probability armRobust effect from a relatively small probability difference. 0.9 is actually relatively similar. This is the strongest of the manipulations, because the 0 outcome drives extinction learning in the BLA, which then inhibits the positive BLA. This can be titrated by the level of the inhibition, with these results with inhib = 1. BLA ExtThe VS Mtx Go - NoThe VS Matrix Go vs. No pathways are ultimately responsible for engaging the plan to approach an arm. This plot shows the activity in the Go pathway minus the No pathway, for the Good vs. Bad arms, also showing the The difference is mostly in the Go pathway, although later in learning the No pathway does learn to oppose the Bad choices. Learned Estimates of PVpos OutcomesA critical feature of the model is that the PFC areas (OFCpos, ILpos) learn to predict the subsequent US outcome, at the time of the CS. The The following graph shows this PVposEst value, at the start of the arm when first seeing the CS. It clearly learns to predict the difference in outcomes. We also record the spread of activity over the PVposEst layer, as a measure of the expected variance in predicted outcomes. The following plot shows this variance (again at the time of the CS onset), showing that the p=0.8 and 0.9 cases have a higher variance relative to the 100% case. Active Maintenance in PFC layersThe PFC layers show differential activity for the control vs. p=0.8 case, across all relevant layers. OFCpos: ILpos: PL: |
Beta Was this translation helpful? Give feedback.
-
US Magnitude = 0.2, 0.5Overall, similar results are found with the manipulation of US magnitude.
BehaviorThe key difference here is that the initial difference in behavioral choice weakens over time, because unlike the US omission case, the model learns to expect the reliable difference in reward outcome, and thus the RPE dopamine signal between conditions converges over time, reducing the overall difference among conditions. DA RPEThis plot shows the decrease in DA over time, due to reward prediction: BLAThe BLA exhibits the same differentiation among conditions, without any differences in extinction activity (not shown). VS Matrix Go - NoLikewise differences in VS matrix activity: PFC ActivitySimilar differentiation in PFC activity: OFC: PL: |
Beta Was this translation helpful? Give feedback.
-
Effort CostThe effort cost manipulation shows similar effects. The effort costs are compressed so large effects are required to see behavioral effects:
BehaviorThe increased effort cost causes the model to avoid the bad arms, as expected: Prediction of Negative PV outcomes (effort)The model learns to predict the increased effort cost for the bad arms: Along with the variance in negative value: VS Matrix Go - NoSimilar differences: |
Beta Was this translation helpful? Give feedback.
-
BLA vs. direct VS Matrix Go v. NoThe model can make choices based on two main neural signals, at the time of the CS onset:
Both of these feed into the VS Matrix Go and No pathways, which compete to make the choice about whether to approach a given arm. In the probabilistic US condition, we can vary the strength of the BLAposExt -> BLAposAcq inhibition to control how much the BLA factor contributes to avoiding the bad outcomes. Inhibition strength is:
The behavioral effects show that the BLA makes an important contribution: And here is the direct effect on BLAposAcq activity, when it is less and less inhibited: and the effects of BLA on VS Matrix Go vs. No: Thus, overall, it is clear that both the PFC and BLA effects contribute in roughly equal proportion to overall choice performance. |
Beta Was this translation helpful? Give feedback.
-
Results from the Rubicon
choose
modelThis thread contains first-pass notes on results from the model.
Overall the model results show that it learns to accurately predict differential outcomes, cued by the CSs associated with each arm, and driven by the US outcomes experienced. The PFC layers, BLA, and VS Matrix show differential activity consistent with the learned expectations, and drive the goal-driven behavior of the model to make choices in favor of the good vs. bad outcomes.
"Bad" in most of the cases in this model just means "less good", as a result of lower expected value, but not actually negative outcomes. Thus, we do not expect the model to never select the "bad" option, and more discriminating choice behavior will require a mechanism for directly comparing options.
Beta Was this translation helpful? Give feedback.
All reactions