February 2026
From Latents to Landscapes: how neural geometry turns time, goals, and attention into adaptive behavior.

Neural population geometry and optimal coding of tasks with shared latent structure

When does a neural population code support generalizable linear decodability? To answer this, the authors model a framework where environmental stimuli are governed by underlying latent coordinates, and a downstream unit uses a supervised Hebbian readout to generalize across multiple related tasks. They analytically demonstrate that the expected multitask generalization error is strictly determined by a tradeoff between four geometric properties (or mesoscopic "order parameters") of the neural population:
- neural-latent correlation $c$: how strongly single-unit responses correlate with latent variables
- signal-signal factorization $f$: the degree to which independant latent variables are mapped onto orthogonal directions in the neural state spae
- signnal-noise factorization $s$: the extent to which trial-to-trial neural noise is orthogonalized away from the primary coding directions
- neural dimension $\mathrm{PR}(\Psi)$: the effective number of dimensions spanned by the population activity, measured via the participation ratio
The authors validate this geometric decomposition across both biological and artificial systems. The core normative insight of the paper is that optimal neural geometry shifts over the course of learning.
- Early in learning (Few-Shot Regime): When data is scarce, the main goal is maximizing the total signal. Optimal representations compress less informative variables, yielding tight correlation between single neurons and task variables.
- Late in learning (Many-Shot Regime): As data becomes abundant, the priority shifts to keeping distinct variables separate from each other. The optimal representation expands, becoming increasingly high-dimensional and distributed, which causes individual single-unit correlations to drop.
Duration between rewards controls the rate of behavioral and dopaminergic learning

A foundational assumption in standard, trial-based reinforcement learning (RL) is that learning is driven by the number of experiences: the more cue-reward pairings an animal encounters, the stronger the learned association. However, by training mice in a cue-reward conditioning task, the authors disprove this assumption, revealing a radically different biological rule governing mesolimbic dopamine and behavior.
By varying the intervals between trials and tracking both licking behavior and cue-evoked dopamine in the nucleus accumbens, the researchers found that learning happens much faster (in terms of required experiences) when trials are spaced further apart.
- The Inter-Reward Interval Rule: Within a broad range, the behavioral and dopaminergic learning rate per reward is directly proportional to the duration between rewards. If rewards are 10× further apart in time, each individual reward produces roughly 10× more learning.
- Separating Cues from Rewards: To prove the brain is tracking time between rewards and not just time between cues, the authors reduced the reward probability to 50%. The time between cue presentations stayed the same, but the time between actual rewards doubled. The learning update per reward scaled with the inter-reward interval, ignoring the number of unrewarded cue presentations.
- Constant Learning Over Time: Because rarer rewards produce proportionally larger updates, the overall amount of learning that occurs over a fixed duration remains constant, completely independent of how many cue-outcome experiences occurred during that time.
This result strongly challenges conventional trial-by-trial Temporal Difference (TD) reinforcement learning models, which are generally forward-looking and trial-counting. Instead, this time-scaling rule favors a retrospective causal-learning model. Because the brain is continuously estimating causal structure over time, rewards act as "update events" that look backward. When a reward is rare, the brain must integrate over a longer stretch of time, necessitating a massive belief update per event to maintain a consistent temporal integration window. In effect, dopamine is not just stamping in discrete trials; it is continuously estimating the causal rate of the environment.
The representation and valuation of subgoals in the human brain during model-based hierarchial behavior

How does the human brain break down complex, multi-step goals into manageable actions? To solve long, sequential tasks, humans rely on behavior that is simultaneously hierarchical (goals and sub-goals) and based on a model of the world. To uncover how the brain orchestrates this, the authors designed a "space taxi" task where participants navigated a two-step transition structure to complete a strict sequence of latent subgoals (collecting a permit, picking up aliens, and dropping them off) in order to earn a final monetary reward.
At the computational core is a model-based hierarchial RL model. At the hierarchial level, the agent tracks the current subgoal and updates it after each subgoal is completed. At the model-based level, it learns the transition structure of the environment and uses this world model to compute values for future options. The brain integrates these two functionalities to evaluate actions or options relevant to the subgoal and the structure of the world.
Using fMRI, the researchers identified a striking division of labor across the brain that supports this hierarchical planning:
- Maintaining the Subgoal: The current latent subgoal—the internal "what am I trying to accomplish right now" state—is actively represented in the ventromedial prefrontal cortex (vmPFC) and the insula, alongside the operculum and cuneus.
- Hierarchical Valuation: When making choices, the brain doesn't just blindly calculate the value of the final reward. Instead, regions like the rostral anterior cingulate cortex (rACC) and dorsomedial frontal cortex (dmFC) track choice values that are dynamically computed as a function of the current subgoal using an internal world model. Other regions, like the ventrolateral prefrontal cortex (vlPFC), track relative exploration-weighted values.
- Action Planning: Before a transition even occurs, future actions can be decoded from the motor cortex and amygdala, demonstrating proactive action-sequence planning.
By separating the networks responsible for maintaining abstract, latent subgoals (vmPFC/insula) from the networks that compute the value of the immediate choices required to reach them (rACC/dmFC), humans can flexibly solve extended tasks using internal world models.
Geometry of neural dynamics along the cortical attractor landscape reflects changes in attention

How does attention affect the landscape of large-scale cortical activity? The authors fit a dynamical-systems model to whole-brain fMRI data from rest, tasks, and movie-watching. The model decomposed activity into intrinsic dynamics (how cortical regions drive one another) and extrinsic dynamics (how stimulus features perturb brain activity). By simulating the fitted model, they inferred stable attractors: the states toward which the cortical activity would drift. These attractors aligned with canonical cortical gradients and functional networks.
The main finding is that attentional state is reflected not simply by which brain state is active, but the geometry of neural trajectories around attractors.
- When participants engaged in effortful cognitive tasks, their neural dynamics converged rapidly and directly into a task-relevant attractor. Functionally, focused attention acts like a steep funnel, rapidly driving the brain into a specific, stable state.
- In contrast, when participants were passively engaged in watching a sitcom, their neural dynamics occupied a much "flatter" region of the landscape. The neural trajectories wandered and were actually directed away from the deep canonical attractors.
Importantly, these effects were mainly found for intrinsic dynamics, not the extrinsic stimulus-driven component, suggesting that attentional state modulates the brain’s internal dynamical regime rather than merely reflecting incoming sensory input.
References
- 1
- 2
- 3
- 4