VOOZH about

URL: https://link.springer.com/article/10.1023/A:1022699029236?error=cookies_not_supported&code=6a67db87-a51b-4e5c-9c02-ea9731527ecd

⇱ Distributed Representations, Simple Recurrent Networks, And Grammatical Structure | Machine Learning | Springer Nature Link


Skip to main content

Distributed Representations, Simple Recurrent Networks, And Grammatical Structure

  • Published:

Abstract

In this paper three problems for a connectionist account of language are considered:

1. What is the nature of linguistic representations?

2. How can complex structural relationships such as constituent structure be represented?

3. How can the apparently open-ended nature of language be accommodated by a fixed-resource system?

Using a prediction task, a simple recurrent network (SRN) is trained on multiclausal sentences which contain multiply-embedded relative clauses. Principal component analysis of the hidden unit activation patterns reveals that the network solves the task by developing complex distributed representations which encode the relevant grammatical relations and hierarchical constituent structure. Differences between the SRN state representations and the more traditional pushdown store are discussed in the final section.

Article PDF

Similar content being viewed by others

Discover the latest articles, books and news in related subjects, suggested using machine learning.

References

  • Baker, C.L.(1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10, 533-581.

    Google Scholar 

  • Bates, E., & MacWhinney, B.(1982). Functionalist approaches to grammar.In E.Wanners, & L.Gleitman (Eds.), Language acquisition:The state of the art. New York: Cambridge University Press.

    Google Scholar 

  • Chafe, W.(1970). Meaning and the structure of language. Chicago: University of Chicago Press.

    Google Scholar 

  • Chalmers, D.J.(1990). Syntactic transformations on distributed representations.Center for Research on Concepts and Cognition, Indiana University.

  • Chomsky, N.(1957). Syntactic structures.The Hague:Mouton.

  • Dell, G.(1986). A spreading activation theory of retrieval in sentence production. Psychological Review, 93, 283-321.

    Google Scholar 

  • Dolan, C., & Dyer, M.G.(1987). Symbolic schemata in connectionist memories:Role binding and the evolution of structure (Technical Report UCLA-AI-87-11). Los Angeles, CA: University of California, Los Angeles, Arti-ficial Intelligence Laboratory.

    Google Scholar 

  • Dolan, C.P., & Smolensky, P.(1988). Implementing a connectionist production system using tensor products (Technical Report UCLA-AI-88-15). Los Angeles, CA: University of California, Los Angeles, Artificial Intelligence Laboratory.

    Google Scholar 

  • Elman, J.L.(1989). Representation and structure in connectionist models (Technical Report CRL-8903). San Diego, CA: University of California, San Diego, Center for Research in Language.

    Google Scholar 

  • Elman, J.L.(1990). Finding structure in time. Cognitive Science, 14, 179-211.

    Google Scholar 

  • Fauconnier, G.(1985)Mental spaces. Cambridge, MA: MIT Press.

    Google Scholar 

  • Feldman, J.A. & Ballard, D.H.(1982). Connectionist models and their properties. Cognitive Science, 6, 205-254.

    Google Scholar 

  • Fillmore, C.J.(1982). Frame semantics.In Linguistics in the morning calm. Seoul: Hansin.

    Google Scholar 

  • Flury, G.(1988). Common principal components and related multivariate models. New York: Wiley.

    Google Scholar 

  • Fodor, J.(1976). The language of thought. Harvester Press, Sussex.

    Google Scholar 

  • Fodor, J., & Pylyshyn, Z.(1988). Connectionism and cognitive architecture:A critical analysis.In S.Pinker & J.Mehler (Eds.), Connections and symbols. Cambridge, MA: MIT Press.

    Google Scholar 

  • Forster, K.I.(1979). Levels of processing and the structure of the language processor.In W.E.Cooper, & E. Walker (Eds.), Sentence processing:Psychotinguistic studies presented to Merrill Garrett. Hillsdale NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Gasser, M., & Lee, C-D.(1990). Networks that learn phonology.Computer Science Department, Indiana University.

  • Givon, T.(1984). Syntax:A functional-typological introduction.Volume 1. Amsterdam: John Benjamins.

    Google Scholar 

  • Gold, E.M.(1967). Language identification in the limit. Information and Control, 16, 447-474.

    Google Scholar 

  • Gonzalez, R.C., & Wintz, P.(1977). Digital image processing. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Grosjean, F.(1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics, 28, 267-283.

    Google Scholar 

  • Hanson, S.J., & Burr, D.J.(1987). Knowledge representation in connectionist networks. Bell Communications Research, Morristown, New Jersey.

    Google Scholar 

  • Hare, M.(1990). The role of similarity in Hungarian vowel harmony:A connectionist account (CRL Technical Report 9004). San Diego, CA: University of California, Center for Research in Language.

    Google Scholar 

  • Hare, M., Corina, D., & Cottrell, G.(1988). Connectionist perspective on prosodic structure (CRL Newsletter, Vol. 3, No. 2). San Diego, CA: University of California, Center for Research in Language.

    Google Scholar 

  • Hinton, G.E.(1988). Representing part-whole hierarchies in connectionist networks (Technical Report CRG-TR-88-2).University of Toronto, Connectionist Research Group.

  • Hinton, G.E., McClelland, J.L., & Rumelhart, D.E.(1986). Distributed representations, in D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing:Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.

    Google Scholar 

  • Hopper, P.J., & Thompson, S.A.(1980). Transitivity in grammar and discourse. Language, 56, 251-299.

    Google Scholar 

  • Hornik, K., Stinchcombe, M., & White, H.(in press). Multi-layer feedforward networks are universal approx-imators. Neural Networks.

  • Jordan, M.I.(1986). Serial order:A parallel distributed processing approach (Technical Report 8604). San Diego, CA: University of California, San Diego, Institute for Cognitive Science.

    Google Scholar 

  • Kawamoto, A.H.(1988). Distributed representations of ambiguous words and their resolution in a connectionist network.In S.L.Small, G.W.Cottrell, & M.K.Tanenhaus (Eds.), Lexical ambiguity resolution:Perspectives from psycholinguistics, neuropsychology, and artificial intelligence. SanMateo, CA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Kirsh, D.(in press). When is information represented explicitly?In J.Hanson (Ed.), Information, thought, and content. Vancouver: University of British Columbia.

  • Kuno, S.(1987). functional syntax:Anaphora, discourse and empathy. Chicago: The University of Chicago Press.

    Google Scholar 

  • Kutas, M.(1988). Event-related brain potentials (ERPs)elicited during rapid serial presentation of congruous and incongruous sentences.In R.Rohrbaugh, J.Rohrbaugh, & P.Parasuramen (Eds.), Current trends in brain potential research (EEG Supplement 40). Amsterdam: Elsevier.

    Google Scholar 

  • Kutas, M., & Hillyard, S.A.(1980). Reading senseless sentences:Brain potentials reflect semantic inconguity. Science, 207, 203-205.

    Google Scholar 

  • Lakoff, G.(1987). Women, fire, and dangerous things:What categories reveal about the mind. Chicago: University of Chicago Press.

    Google Scholar 

  • Langacker, R.W.(1987). Foundations of cognitive grammar:Theoretical perspectives.Volume 1. Stanford: Stanford University Press.

    Google Scholar 

  • Langacker, R.W.(1988). A usage-based model. Current Issues in Linguistic Theory, 50, 127-161.

    Google Scholar 

  • MacWhinney, B., Leinbach, J., Taraban, R., & McDonald, J.(1989). Language learning:Cues or rules?Journal of Memory and Language, 28, 255-277.

    Google Scholar 

  • Marslen-Wilson, W., & Tyler, L.K.(1980). The temporal structure of spoken language understanding. Cognition, 8, 1-71.

    Google Scholar 

  • McClelland, J.L.(1987). The case for interactionism in language processing.In M.Coltheart (Ed.), Attention and performance XII:The psychology of reading. London: Erlbaum.

    Google Scholar 

  • McClelland, J.L., St.John, M., & Taraban, R.(1989). Sentence comprehension:A parallel distributed processing approach.Manuscript, Department of Psychology, Carnegie Mellon University.

  • McMillan, C, & Smolensky, P.(1988). Analyzing a connectionist model as a system of soft rules (Technical Report CU-CS-303-88).University of Colorado, Boulder, Department of Computer Science.

  • Miikkulainen, R., & Dyer, M.(1989a). Encoding input/output representations in connectionist cognitive systems. In D.S.Touretzky, G.E.Hinton, & T.J.Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School. Los Altos, CA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Miikkulainen, R., & Dyer, M.(1989b). A modular neural network architecture for sequential paraphrasing of script-based stories.In Proceedings of the International Joint Conference on Neural Networks, IEEE.

  • Mozer, M.(1988). A focused back-propagation algorithm for temporal pattern recognition.(Technical Report CRG-TR-88-3).University of Toronto, Departments of Psychology and Computer Science.

  • Mozer, M.C., & Smolensky, P.(1989). Skeletoniwtion:A technique for trimming the fat from a network via relevance assessment (Technical Report CU-CS-421-89).University of Colorado, Boulder, Department of Computer Science.

  • Oden, G.(1978). Semantic constraints and judged preference for interpretations of ambiguous sentences. Memory and Cognition, 6, 26-37.

    Google Scholar 

  • Pinker, S.(1989). Learnability and cognition:The acquisition of argument structure, Cambridge, MA: MIT Press.

    Google Scholar 

  • Pollack, J.B.(1988). Recursive auto-associative memory:Decising compositional distributed representations. Proceedings of the Tenth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Pollack, J.B.(in press). Recursive distributed representations. Artificial Intelligence.

  • Ramsey, W.(1989). The philosophical implications of connectionism.Ph.D.thesis, University of California, San Diego.

    Google Scholar 

  • Reich, P.A., & Dell, G.S.(1977). Finiteness and embedding.In E.L.Blansitt, Jr., & P.Maher (Eds.), The third LACUS forum. Columbia, SC: Hornbeam Press.

    Google Scholar 

  • Rumelhart, D.E., Hinton, G.E., & Williams, R.J.(1986). Learning internal representations by error propagation. In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing:Explorations in the microstruc-ture of cognition (Vol. 1). Cambridge, MA: MIT Press.

    Google Scholar 

  • Rumelhart, D.E., & McClelland, J.L.(1986a). PDP Models and general issues in cognitive science.In D.E. Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing:Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.

    Google Scholar 

  • Rumelhart, D.E., & McClelland, J.L.(1986b). On learning the past tenses of English verbs.In D.E.Rumelhart, & J.L.McClelland (Eds.), Parallel distributed processing:Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press.

    Google Scholar 

  • Salasoo, A., & Pisoni, D.B.(1985). Interaction of knowledge sources in spoken word identification. Journal of Memory and Language, 24, 210-231.

    Google Scholar 

  • Sanger, D.(1989). Contribution analysis:A technique for assigning responsibilities to hidden units in connectionist networks (Technical Report CU-CS-435-89).University of Colorado, Boulder, Department of Computer Science.

  • Schlesinger, I.M.(1971). On linguistic competence.IN Y.Bar-Hillel (Ed.), Pragmatics of natural languages. Dordrecht, Holland:Reidel.

    Google Scholar 

  • Sejnowski, T.J., & Rosenberg, C.R.(1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145-168.

    Google Scholar 

  • Servan-Schreiber, D., Cleeremans, A., & McClelland, J.L.(1991). Graded state machines:The representation of temporal contingencies in simple recurrent networks. Machine Learning, 7, 161-193.

    Google Scholar 

  • Shastri, L., & Ajjanagadde, V.(1989). A conneclionist system for rule based reasoning with multi-place predicates and variables (Technical Report MS-CIS-8905).University of Pennsylvania, Computer and Information Science Department.

  • Smolensky, P.(1987a). On variable binding and the representation of symbolic structures in connectionist systems (Technical Report CU-CS-355-87).University of Colorado, Boulder, Department of Computer Science.

  • Smolensky, P.(1987b). On the proper treatment of connectionsm (Technical Report CU-CS-377-87).University of Colorado, Boulder, Department of Computer Science. Smolensky, P.(1987c).Putting together connectionism-again (Technical Report CU-CS-378-87).University of Colorado, Boulder, Department of Computer Science.

  • Smolensky, P.(1988). On the proper treatment of connectionism. The Behavioral and Brain Sciences, 11.

  • Smolensky, P.(in press). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence.

  • St.John, M., & McClelland, J.L.(in press). Learning and applying contextual constraints in sentence comprehension (Technical Report). Pittsburgh, PA: Carnegie Mellon University, Department of Psychology.

  • Stemberger, J.P.(1985). The lexicon in a model of language production. New York: Garland Publishing.

    Google Scholar 

  • Stinchcombe, M., & White, H.(1989). Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. Proceedings of the International Joint Conference on Neural Networks, Washington, D.C.

  • Stolz, W.(1967). A study of the ability to decode grammatically novel sentences. Journal of Verbal Learning and Verbal Behavior, 6, 867-873.

    Google Scholar 

  • Tanenhaus, M.K., Garnseyh, S.M., & Boland, J.(in press). Combinatory lexical information and language comprehension.In G.Altmann (Ed.), Cognitive models of speech processing:Psycholinguistic and computational perspectives. Cambridge, MA: MIT Press.

  • Touretzky, D.S.(1986). BoltzCONS:Reconciling connectionism with the recursive nature of stacks and trees. Proceedings of the Eight Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Touretzky, D.S.(1989). Rules and maps in connectionist symbol processing (Technical Report CMU-CS-89-158). Pittsburgh, PA: Carnegie Mellon University, Department of Computer Science.

    Google Scholar 

  • Touretzky, D.S.(1989). Towards a connectionist phonology:The "many maps"approach to sequence manipulation. Proceedings of the nth Annual Conference of the Cognitive Science Society, 188-195.

  • Touretzky, D.S., & Hinton, G.E.(1985). Symbols among the neurons:Details of a connectionist inference architecture. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles.

  • Touretzky, D.S., & Wheeler, D.W.(1989). A connectionist implementation of cognitive phonology (Technical Report CMU-CS-89-144). Pittsburgh, PA: Carnegie Mellon University, School of Computer Science.

    Google Scholar 

  • Van Gelder, T.J.(in press). Compositionality:Variations on a classical theme. Cognitive Science.

Download references

Author information

Authors and Affiliations

  1. Departments of Cognitive Science and Linguistics, University of California, San Diego.

    Jeffrey L. Elman

Authors
  1. Jeffrey L. Elman

About this article

Cite this article

Elman, J.L. Distributed Representations, Simple Recurrent Networks, And Grammatical Structure. Machine Learning 7, 195–225 (1991). https://doi.org/10.1023/A:1022699029236

Download citation

  • Issue date:

  • DOI: https://doi.org/10.1023/A:1022699029236