Abstract
A broad range of neural and behavioral data suggests that the brain contains multiple systems for behavioral choice, including one associated with prefrontal cortex and another with dorsolateral striatum. However, such a surfeit of control raises an additional choice problem: how to arbitrate between the systems when they disagree. Here, we consider dual-action choice systems from a normative perspective, using the computational theory of reinforcement learning. We identify a key trade-off pitting computational simplicity against the flexible and statistically efficient use of experience. The trade-off is realized in a competition between the dorsolateral striatal and prefrontal systems. We suggest a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate. This provides a unifying account of a wealth of experimental evidence about the factors favoring dominance by either system.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Thalamic circuits for independent control of prefrontal signal and noise
Dorsolateral prefrontal cortex plays causal role in probability weighting during risky choice
Temporal regularities shape perceptual decisions and striatal dopamine signals
References
Kahneman, D. & Frederick, S. Representativeness revisited: attribute substitution in intuitive judgment. in Heuristics and Biases: the Psychology of Intuitive Judgment (eds. T. Gilovich, D.G. & Kahneman, D.) 49β81 (Cambridge University Press, New York, 2002).
Loewenstein, G. & O'Donoghue, T. Animal spirits: affective and deliberative processes in economic behavior. Working Paper 04β14, Center for Analytic Economics, Cornell University (2004).
Lieberman, M.D. Reflective and reflexive judgment processes: a social cognitive neuroscience approach. in Social Judgments: Implicit and Explicit Processes (eds. Forgas, J., Williams, K. & von Hippel, W.) 44β67 (Cambridge University Press, New York, 2003).
Killcross, S. & Blundell, P. Associative representations of emotionally significant outcomes. in Emotional Cognition: from Brain to Behaviour (eds. Moore, S. & Oaksford, M.) 35β73 (John Benjamins, Amsterdam, 2002).
Dickinson, A. & Balleine, B. The role of learning in motivation. in Stevens' Handbook of Experimental Psychology Vol. 3: Learning, Motivation and Emotion 3rd edn. (ed. Gallistel, C.R.) 497β533 (Wiley, New York, 2002).
Packard, M.G. & Knowlton, B.J. Learning and memory functions of the basal ganglia. Annu. Rev. Neurosci. 25, 563β593 (2002).
Owen, A.M. Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog. Neurobiol. 53, 431β450 (1997).
Yin, H.H., Ostlund, S.B., Knowlton, B.J. & Balleine, B.W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513β523 (2005).
Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V. & Graybiel, A.M. Building neural representations of habits. Science 286, 1745β1749 (1999).
Holland, P.C. & Gallagher, M. Amygdala-frontal interactions and reward expectancy. Curr. Opin. Neurobiol. 14, 148β155 (2004).
Pasupathy, A. & Miller, E.K. Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433, 873β876 (2005).
McClure, S.M., Laibson, D.I., Loewenstein, G. & Cohen, J.D. Separate neural systems value immediate and delayed monetary rewards. Science 306, 503β507 (2004).
O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452β454 (2004).
Yin, H.H., Knowlton, B.J. & Balleine, B.W. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19, 181β189 (2004).
Balleine, B.W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407β419 (1998).
Coutureau, E. & Killcross, S. Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146, 167β174 (2003).
Killcross, S. & Coutureau, E. Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13, 400β408 (2003).
Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, Cambridge, Massachusetts, 1998).
Houk, J.C., Adams, J.L. & Barto, A.G. A model of how the basal ganglia generate and use neural signals that predict reinforcement. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 249β270 (MIT Press, Cambridge, Massachusetts, 1995).
Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593β1599 (1997).
Houk, J.C. & Wise, S.P. Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb. Cortex 5, 95β110 (1995).
Dickinson, A. Actions and habitsβthe development of behavioural autonomy. Phil. Trans. R. Soc. Lond. B 308, 67β78 (1985).
Adams, C.D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. 34B, 77β98 (1982).
Faure, A., Haberland, U., CondΓ©, F. & Massioui, N.E. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J. Neurosci. 25, 2771β2780 (2005).
Colwill, R.M. & Rescorla, R.A. Instrumental responding remains sensitive to reinforcer devaluation after extensive training. J. Exp. Psychol. Anim. Behav. Process. 11, 520β536 (1985).
Holland, P.C. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 30, 104β117 (2004).
Balleine, B.W., Garner, C., Gonzalez, F. & Dickinson, A. Motivational control of heterogeneous instrumental chains. J. Exp. Psychol. Anim. Behav. Process. 21, 203β217 (1995).
Holland, P. Amount of training affects associatively-activated event representation. Neuropharmacology 37, 461β469 (1998).
Blundell, P., Hall, G. & Killcross, S. Preserved sensitivity to outcome value after lesions of the basolateral amygdala. J. Neurosci. 23, 7702β7709 (2003).
Balleine, B.W. & Dickinson, A. The effect of lesions of the insular cortex on instrumental conditioning: evidence for a role in incentive memory. J. Neurosci. 20, 8954β8964 (2000).
Izquierdo, A., Suda, R.K. & Murray, E.A. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J. Neurosci. 24, 7540β7548 (2004).
Deneve, S. & Pouget, A. Bayesian multisensory integration and cross-modal spatial links. J. Physiol. (Paris) 98, 249β258 (2004).
Dearden, R., Friedman, N. & Russell, S.J. Bayesian Q-learning. in Proceedings of the 15th National Conference on Artificial Intelligence (AAAI) 761β768 (1998).
Mannor, S., Simester, D., Sun, P. & Tsitsiklis, J.N. Bias and variance in value function estimation. in Proceedings of the 21st International Conference on Machine Learning (ICML) 568β575 (2004).
Nakahara, H., Doya, K. & Hikosaka, O. Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach. J. Cogn. Neurosci. 13, 626β647 (2001).
Tanaka, S.C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887β893 (2004).
Chavarriaga, R., Strosslin, T., Sheynikhovich, D. & Gerstner, W. A computational model of parallel navigation systems in rodents. Neuroinformatics 3, 223β242 (2005).
Doya, K. What are the computations in the cerebellum, the basal ganglia, and the cerebral cortex. Neural Netw. 12, 961β974 (1999).
Suri, R.E. Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. Exp. Brain Res. 140, 234β240 (2001).
Smith, A.J., Becker, S. & Kapur, S. A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural Comput. 17, 361β395 (2005).
Dayan, P. & Balleine, B.W. Reward, motivation and reinforcement learning. Neuron 36, 285β298 (2002).
Daw, N.D., Courville, A.C. & Touretzky, D.S. Timing and partial observability in the dopamine system. in Advances in Neural Information Processing Systems 15, 99β106 (MIT Press, Cambridge, Massachusetts, 2003).
Alexander, G.E., Delong, M.R. & Strick, P.L. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357β381 (1986).
Baum, E.B. & Smith, W.D. A Bayesian approach to relevance in game playing. Artificial Intelligence 97, 195β242 (1997).
Pouget, A., Dayan, P. & Zemel, R.S. Inference and computation with population codes. Annu. Rev. Neurosci. 26, 381β410 (2003).
Yu, A.J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681β692 (2005).
Holroyd, C.B. & Coles, M.G. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679β709 (2002).
Botvinick, M.M., Cohen, J.D. & Carter, C.S. Conflict monitoring and anterior cingulate cortex: an update. Trends Cogn. Sci. 8, 539β546 (2004).
Hartley, T. & Burgess, N. Complementary memory systems: competition, cooperation and compensation. Trends Neurosci. 28, 169β170 (2005).
Parkinson, J.A., Roberts, A.C., Everitt, B.J. & Di Ciano, P. Acquisition of instrumental conditioned reinforcement is resistant to the devaluation of the unconditioned stimulus. Q. J. Exp. Psychol. B 58, 19β30 (2005).
Acknowledgements
We are grateful to B. Balleine, A. Courville, A. Dickinson, P. Holland, D. Joel, S. McClure and M. Sahani for discussions. The authors are supported by the Gatsby Foundation, the EU Bayesian Inspired Brain and Artefacts (BIBA) project (P.D., N.D.), a Royal Society USA Research Fellowship (N.D.) and a Dan David Fellowship (Y.N.).
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1 (download PDF )
Value propagation in tree search, after 50 steps of learning the task in Figure 1a. (PDF 248 kb)
Supplementary Fig. 2 (download PDF )
Example of learning in the cache algorithm, following a single transition from state s to sβ² having taken action a. (PDF 306 kb)
Rights and permissions
About this article
Cite this article
Daw, N., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8, 1704β1711 (2005). https://doi.org/10.1038/nn1560
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/nn1560
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
