VOOZH about

URL: https://thenewstack.io/reinforcement-learning-pioneers-honored-with-acm-turing-prize/

⇱ Reinforcement Learning Pioneers Honored With ACM Turing Prize - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-03-06 05:14:50
Reinforcement Learning Pioneers Honored With ACM Turing Prize
AI Agents / Tech Culture

Reinforcement Learning Pioneers Honored With ACM Turing Prize

Today's AI agents owe much to researchers Andrew Barto and Richard Sutton.
Mar 6th, 2025 5:14am by Joab Jackson
👁 Featued image for: Reinforcement Learning Pioneers Honored With ACM Turing Prize
Feature image produced by Google Gemini AI. 

Two researchers’ early theoretical work on reinforcement learning was recognized Wednesday, as the Association for Computing Machinery named researchers Andrew G. Barto and Richard S. Sutton as the winners of the 2024 ACM A.M. Turing Award.

Both researchers were crucial in developing the conceptual and algorithmic foundations of reinforcement learning, a bedrock of current AI-based agent technologies.

They will collectively carry off a $1 million prize (courtesy of Google) for their labors.

The ACM A.M. Turing Award is often known as the “Nobel Prize in Computing,” and is named after Alan M. Turing, the British mathematician who articulated the mathematical foundations of computing, as well as coined the Turing Test, a thought experiment (and current benchmark) for evaluating whether a machine has achieved human-like intelligent behavior.

So this year’s award is quite apropos to its namesake.

“In a 1947 lecture, Alan Turing stated ‘What we want is a machine that can learn from experience,’” noted Jeff Dean, Google’s Chief Scientist for Google DeepMind, in a statement. “Reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing’s challenge. Their work has been a lynchpin of progress in AI over the last several decades.”

Barto is Professor Emeritus of Information and Computer Sciences at the University of Massachusetts, Amherst. Sutton is a Professor of Computer Science at the University of Alberta, as well as a research scientist at Keen Technologies (“John Carmack’s AGI Effort”), and a fellow at the Alberta Machine Intelligence Institute.

Full Agency

👁 Reinforcement Learning Book cover.

Reinforcement Learning book cover

Reinforcement learning, inspired by ideas in neuroscience and even psychology, formed the basis of Agentic AI, or the basis of computer entities that perceive and act, preferably acting in a way that fulfills the intent of users. To do this, agents rely on “rewards,” or feedback on the quality of their behavior,

Barto and Sutton developed many of the basics of reinforcement learning, and shared their learning in the seminal 1998 textbook “Reinforcement Learning: An Introduction.”

The work built on Markov Decision Processes (MDPs), wherein an agent makes decisions in a random environment, and gets a reward signal after each action, with the goal of maximizing its rewards.

MDP assumed that the agent knew about its environs. Reinforcement learning took the next step and assumed agents knew nothing about the environment or its rewards.

“The minimal information requirements of reinforcement learning, combined with the generality of the MDP framework, allows reinforcement learning algorithms to be applied to a vast range of problems,” The ACM announcement summarized.

The duo were the first to discover that neural networks can represent learned functions and that agents could combine learning and planning. Acquiring knowledge of the environment could then be the basis for planning.

Some of the other techniques the duo pioneered — working with each other or other researchers — include temporal difference learning, which helped solve reward prediction problems, and policy-gradient methods to address those high-dimensional action spaces where reinforcement learning falls short.

Successful Applications

Reinforcement Learning got its first big win beating best human Go players in 2016 and 2017, via the AlphaGo computer program.

AI systems descended from AlphaGo have been adapted to tackle other problems. In 2022, researchers used one such system to discover new algorithms for a fundamental mathematical task called matrix multiplication. 4/6 https://t.co/9Yku0j8C6H pic.twitter.com/pjpeBczc1M

— Quanta Magazine (@QuantaMagazine) March 5, 2025

OpenAI’s ChatGPT also owes its success to reinforcement learning. According to ACM, to train its large language models, the service uses a technique called reinforcement learning from human feedback (RLHF) to capture human expectations.

TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: turing, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.