AWARDEES: Geoffrey Hinton, James L. McClelland, David E. Rumelhart

FEDERAL FUNDING AGENCIES: Department of Defense, National Institutes of Health, National Science Foundation

Decades before artificial intelligence emerged as the platform for innovation that it is today, David Rumelhart, James McClelland, and Geoffrey Hinton were exploring a new model to explain human cognition. Dissatisfied with the prevailing symbolic theory of cognition, David Rumelhart began to articulate the need for a new approach to modeling cognition in the mid-1970s, teaming up with McClelland with support from the National Science Foundation to create a model of human perception that employed a new set of foundational ideas. At around the same time, Don Norman, an early leader in the field of cognitive science, obtained funding from the Sloan Foundation to bring together an interdisciplinary group of junior scientists, including Hinton, with backgrounds in computer science, physics, and neuroscience. Rumelhart, McClelland, and Hinton led the development of the parallel distributed processing framework, also known as PDP, in the early-1980s, focusing on how networks of simple processing units, inspired by the properties of neurons in the brain, could give rise to human cognitive abilities. While many had dismissed the use of neural networks as a basis for building models of cognition in the 1960s and 1970s, the PDP group revived interest in the approach. Skeptics critiqued the new models too, and had only limited success in enabling effective artificially intelligent systems until the 2010s, when massive increases in the amount of available data and computer power enabled Hinton and others to achieve breakthroughs leading to an explosion of new technological advancements and applications.

Understanding Neural Networks

Long before the parallel distributed processing framework could be realized, scientists were already exploring and considering the biological structure of neurons in the brain and how interconnected networks of neurons might underly human cognition and perception. Human brain cells, called neurons, form a complex, highly interconnected network of units that send electrical signals to each other, communicating via specialized connections called synapses. Researchers built upon these foundational observations to develop theories about how the strengths of connections in the brain could be adapted to create layered networks that could perform complicated tasks like recognizing objects. Modeling was also a critical component of demonstrating how a biological neural network could work, and researchers used simulations of adaptive neural networks on computers to explore these ideas, which drew inspiration from the neural circuitry of the brain.

In 1958, Frank Rosenblatt developed the Perceptron learning procedure which he implemented on a 5-ton computer the size of a room. The Perceptron could be fed a series of cards with markings on the left or right. After 50 trials, the computer taught itself to distinguish cards marked on the left from cards marked on the right. Rosenblatt called the Perceptron “the first machine capable of having an original idea.” While Rosenblatt’s vision was prophetic, his model was limited. Although it relied upon several layers of neuron-like units, it had only one layer of connections that could “learn.” In 1969, Marvin Minsky and Seymour Papert published a book, Perceptrons, arguing that there were fundamental limitations to what tasks could be learned using a single layer of adaptive connections, and questioning whether neural networks could ever prove useful in carrying out truly intelligent computations.

In part because of Minsky and Papert’s influence, the dominant approach in artificial intelligence and cognitive psychology in the 1950s through the 1970s was focused on symbolic processing systems. In the symbolic approach, processes were often thought to be modular and compared to computer programs – sequential ordered lists of rules that, when applied to some symbolic input (for example, the present tense of a verb), would produce a desired output (for example for the verb’s past tense, the rules were: ‘if it ends in e, add d; otherwise, add ed’). The structure of the neural networks in the brain, on which everyone agreed the mind was implemented, was thought to be essentially irrelevant to understanding cognitive function. But by the late 1970s, it was becoming apparent that models built on these assumptions were failing to capture many basic and fundamental aspects of human cognition, spurring the search for new theoretical alternatives.

The PDP Research Group

In the late 1960s, after earning his Ph.D. at Stanford University, David Rumelhart, a mathematical psychologist by training, joined the psychology department at the University of California, San Diego. Rumelhart was interested in building explicit computational models of human cognition and explored models within the symbolic paradigm in the early 1970s. Soon, however, he found the classic symbolic framework for understanding human thought process to be unsatisfactory. Starting in the mid-1970s, he wrote several papers aimed at addressing the shortcomings of the symbolic approach, beginning to explore alternatives that might overcome their limitations. Don Norman, the senior scientist in the lab at the time, recognized Rumelhart’s potential and helped support his efforts to pursue alternative approaches.

In 1974, James McClelland joined UCSD, and it was there that McClelland and Rumelhart discovered their mutual interest in going beyond the symbols and rules of the prevailing paradigm. Like Rumelhart, McClelland felt that the symbolic theory fell short of capturing many aspects of cognition, particularly the role of context in perception and language understanding. For example, context lets us see the very same line set of 3-line segments as two different letters when these segments occur in different contexts, as shown in the image below. The symbolic models of the early days of cognitive psychology could not explain such findings, since each letter had to be identified individually before information about possible words could be accessed.

In the late 1970s, Rumelhart and McClelland each received grants from the National Science Foundation, allowing them to focus on capturing the influence of word context on the perception of letters. McClelland recalls, “[Dave and I] had coffee together and he asked me what I was working on. I told him I was trying to build a model inspired by his paper on context effects to capture findings I had explored as a graduate student. And he said, “that's funny, I'm trying to build a model of some of my own related findings!” I think we were both excited to join forces, combining his experience as a modeler with my background as an experimental scientist. I was particularly impressed by Dave’s ability to take vague ideas and turn them into models that made them explicit, so that their implications could be fully explored.” Their joint work was an early instance of a neural network model, leading them both in many new directions.

Rumelhart, McClelland, and others began experimenting and publishing ideas to help strengthen the case for a new framework. McClelland’s early research looked at how people recognized words and represented categories, and how these processes might be modeled using neural networks. One of the early models Rumelhart developed was focused on modeling how people typed on a keyboard (they often prepare to type many successive letters simultaneously). “Ironically, he was a terrible typist,” recalls his son Peter, who often found himself to be an early “test subject” for Rumelhart’s research ideas.

Rumelhart and McClelland’s early modeling work led them to consider how learning occurs in a neural network. Rumelhart’s son Karl recalls, “He was curious to learn how people learn” and that was a driving force behind his research. “In a neural network, it is the connections between the neuron-like processing units that determine the computations that the system performs,” explains McClelland. “We both became fascinated with understanding more about how the right connections become established – or in other words, how the brain learns.” Together Rumelhart and McClelland developed models showing how a neural network could begin to explain a child’s ability to learn the rules of language or to form representations capturing people’s memory for the members of a category.

Meanwhile, in 1978, Rumelhart and McClelland were joined at UCSD by Geoffrey Hinton, a visiting scholar who brought new perspectives to the group after completing his Ph.D. in artificial intelligence. Hinton recalls, “After I wrote my thesis, I had dropped out of academia.” He explored other career pathways for a time but then an advertisement for a program at UCSD caught his eye. “I applied and they rejected me!” A while later, Hinton had accepted a different postdoc position, but within two hours of accepting that job, he got a call with an offer for the UCSD position. He quickly withdrew his acceptance of the other position in favor of the UCSD opportunity. “That was one of the best decisions I ever made,” he says.

Hinton’s visiting scholar position ended in 1980, but he returned for 6 months in 1982, which ended up being an intensive period of mapping out the plan for a book that would capture the key ideas for a new framework. “When Geoff came back, we decided on the first three chapters, and each of us wrote one of them,” McClelland says. They named this new approach the “parallel distributed processing” framework and started the PDP Research Group. Several others, including physicists, neurobiologists, and mathematicians also joined the group. Francis Crick, the Nobel-Prize-winning co-discoverer of the structure of DNA, had become interested in how the brain supports visual perception, and he also participated in the group’s meetings.

The Parallel Distributed Processing Framework

The parallel distributed processing framework, or PDP for short, describes how cognitive processes can occur within a network of simple, interconnected, neuron-like processing units. The framework has three basic principles: (1) processing is distributed across the units within the network; (2) learning occurs through changes in the strengths of the connections between the units, and these changes depend on the propagation of signals between the units; and (3) memories are not stored explicitly like files in a computer but are reconstructions of past experiences, using the learned connection strengths between the neuron-like units. “One way to think of a neural network,” suggests McClelland, “is to think of it as a kind of community, where collective outcomes depend on everyone working together in communication with each other. One key breakthrough was to make the units and the signals they can send to each other as simple as possible. The intelligence cannot be found in any of the parts – it emerges from the interactions of the components with each other.”

Parallel Distributed Processing Books, Volumes 1 & 2

The mid to late 1980s was a pivotal time for development of the PDP framework. The team of researchers received funding support from the Department of Defense Office of Naval Research and from a private foundation to continue their discovery. In 1986, McClelland and Rumelhart published the two-volume “Parallel Distributed Processing” book, which would become a central text in the field describing a mathematically explicit, algorithmic formalization of the PDP framework of cognition. These two volumes spurred much of the cognitive science community to develop, explore, and test new computational models of phenomena in learning, memory, language, and cognitive development. Introductory chapters laid out the broad reasons for interest in these kinds of models as ways of capturing human cognitive abilities. Rumelhart laid out the framework and described what a model neuron looks like, the mathematical equations for how inputs produce activity in these neurons, and basic ideas about connections. His sons, Karl and Peter, recall that their father rarely would elect to put his name first on a publication — "it was a testament to the team culture he believed in cultivating, and which flourished in the group.” But Rumelhart did place his name first on several key chapters and assumed the role of first author overall, a sign of the importance he gave to this work. “We all agreed that this was fully justified,” says McClelland, “given the depth of Dave’s insights and the seminal thinking he brought to the effort to understand both information processing and learning in neural networks.”

The rest of the book helped frame future directions or elaborations of these foundational ideas and the applications of them. Hinton, who was originally going to be the third editor of the book, had pivoted in 1985 to pursuing another model he believed was a better theory for how the brain worked. “I said the future is actually in Boltzmann machines so I'm dropping out of editing these books. That was a big mistake,” Hinton says. While not an editor of the book, Hinton remained a significant influence on the book’s overall development and contributed to several conceptual ideas covered in it.

Origins of the Learning Algorithm That Powers Today’s AI Systems

As the PDP framework was being developed, Rumelhart and Hinton were both interested in addressing the limitations of the existing models for learning, or adjusting connections, in neural networks. One of the ideas they explored was backpropagation, a learning algorithm that makes it possible to train connections across the layers of a deep (i.e. a multi-layer) neural network, overcoming the one-layer limitation of Rosenblatt’s Perceptron. Ideas related to backpropagation were explored by several researchers, but it was Rumelhart, accompanied by Hinton and mathematician Ron Williams, who systematically developed the algorithm and applied it to address many of the challenges Minsky and Papert had posed for neural networks in Perceptrons. The use of the term in neural networks was proposed in 1986 in a landmark Nature paper, “Learning Representations by Back-propagating Errors,” co-authored by Rumelhart, Hinton, and Williams. The paper detailed how backpropagation worked and how it could be used to efficiently train multi-layer neural networks by adjusting the connections between neurons to minimize errors.

The co-authors thought backpropagation would emerge as an effective application of PDP and neural networks. “We thought it would solve everything. And we were a bit puzzled about why it didn’t solve everything,” Hinton recalls. Rumelhart had even received advice to patent the algorithm he developed to maintain proprietary usage. While McClelland and his collaborators relied on backpropagation to address many acquired human abilities, such as reading, and the gradual emergence of children’s understanding of the physical and natural world during the first decade of life, backpropagation failed to take off as an effective model for artificial intelligence. The computers of the 1980s lacked the computational power needed to handle the extensive calculations required by large networks, and early, small networks could only be trained on small datasets, which turned out to be insufficient for addressing realistic applications. However, as successive generations of computers became faster and more powerful, backpropagation would have an opportunity to re-emerge thanks to the groundwork laid by the Nature paper.

Decades Later…A Breakthrough

By the late 1980s, the initial PDP research group members had physically dispersed – Rumelhart joined the faculty at Stanford University, McClelland at Carnegie Mellon University, and Hinton at the University of Toronto – but they each maintained some level of collaboration in the field while also exploring their own research interests.

Given the lack of success with backpropagation to build artificial neural networks, Hinton and others moved on to exploring alternative algorithms that might overcome some of its apparent limitations. But as computational power increased and larger data sets became available early in the new millennium, Hinton revisited the use of backpropagation. In 2012, he and two of his students, using backpropagation on Nvidia’s highly parallelized processors called “graphical processing units”, trained a large neural network with a large amount of data, and were able to achieve a big jump in accuracy on the computer vision problem of image classification. “[Other] AI systems got 25 percent errors, and we got 15 percent errors. It was a huge improvement,” Hinton recalls.

By the mid-2010s, AI research was surging, driven by the success of ever-larger scale AI systems relying on backpropagation in artificial neural networks and trained on larger data sets. The essential idea underlying backpropagation lies at the heart of today’s AI systems, including systems that can recognize and synthesize images, understand and produce speech, and are beginning to capture some aspects of our most cherished human understanding and reasoning abilities.

Long Lasting Impacts

The work done by Rumelhart, McClelland, and Hinton began with the simple curiosity to find an alternative framework that could more completely explain human cognitive functions in the brain. Yet, this basic research laid the groundwork for a revolution in machine learning and artificial intelligence. While modeling and tinkering with applications of the framework was always a part of the work, they couldn’t have foreseen their research underpinning the technologies being developed by trillion-dollar companies today. The PDP framework has profoundly influenced artificial intelligence by demonstrating how neural networks can learn complex patterns and representations through distributed processing and error correction, paving the way for modern deep learning techniques, and improving our understanding of how learning and cognition can be modeled computationally. The PDP book published in 1986 has been cited over 30,000 times and is often regarded as “the bible” of neural network based computational modeling. The book’s core proposals are now standard starting assumptions in many domains of research.

In 1998, David Rumelhart retired from Stanford when the symptoms of Pick’s disease, an Alzheimer's-like disorder, became disabling. In 2011, he passed away from complications with the disease, but the impact of his work – from the people he collaborated with and trained to the early technological breakthroughs in artificial intelligence – lives on today. In 2001, the David E. Rumelhart prize was conceived by Robert J. Glushko, a former student of Rumelhart’s, to honor outstanding contributions to the theoretical foundations of human cognition. The first recipient was Geoffrey Hinton in 2001 and, after chairing the selection committee for several years, James McClelland received the prize in 2010. The Rumelhart Prize honors Rumelhart, the prize recipients, and the broader community of cognitive scientists striving to develop a scientific understanding of minds and mental abilities, drawing insights from a wide range of academic disciplines.

The PDP framework continues to provide a foundation for models of human cognitive abilities, and the effects of brain disorders such as Pick’s disease on these abilities. The framework helped to form the basis for the modern computational approaches that underpin technologies, such as ChatGPT and Bing, and that exceed human abilities in cognitively demanding games like chess and Go. In space, NASA has used artificial neural networks to program the Mars rover so it can learn and adapt to unknown terrain on its own. The framework is beginning to be used to develop systems that can help humans learn and may even help delay the progression of cognitive decline in neural disorders like dementia.

While the impacts of this research have been profound, we shouldn’t think that the resulting technologies fully capture all aspects of human intelligence or solve all of society’s problems. The machines we have today don’t depict our full human cognitive abilities, and one concern is that backpropagation may not fully capture the actual learning algorithms used by the brain. “One key limitation,” says McClelland, “is that people get by with far less training data than AI systems trained with backpropagation.”

While it is not perfect, backpropagation has allowed us to understand a lot about human behavior and can continue to help us explore and advance our ability to build machines that have truly human-like intelligence. Indeed, backpropagation may have the potential to allow large-scale artificial systems to learn more than a human could ever learn, eventually outsmarting humans with potentially profound or even catastrophic consequences. Hinton and McClelland both agree that as a society we should oversee AI technologies to limit potential negative outcomes. At the same time, the exploration of new ideas for capturing intelligence should continue to receive support from governmental and non-profit organizations. Our understanding of intelligence remains incomplete, and research will continue to unlock new possibilities and new forms of understanding for the next breakthrough.

By Meredith Asbury