Decision Making and Reinforcement Learning

Ends soon! Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

👁 Columbia University

Decision Making and Reinforcement Learning

👁 Tony Dear

Instructor: Tony Dear

4,652 already enrolled

Included with

•

Learn more

Ask Coursera

8 modules

Gain insight into a topic and learn the fundamentals.

4.4

24 reviews

Intermediate level

Recommended experience

5 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

8 modules

Gain insight into a topic and learn the fundamentals.

4.4

24 reviews

Intermediate level

Recommended experience

5 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Map between qualitative preferences and appropriate quantitative utilities.
Model non-associative and associative sequential decision problems with multi-armed bandit problems and Markov decision processes respectively
Implement dynamic programming algorithms to find optimal policies
Implement basic reinforcement learning algorithms using Monte Carlo and temporal difference methods

Skills you'll gain

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

Assessments

8 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 8 modules in this course

This course is an introduction to sequential decision making and reinforcement learning. We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making. We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback. We will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms. We touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods. Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning. We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods. An emphasis on algorithms and examples will be a key part of this course.

Welcome to Decision Making and Reinforcement Learning! During this week, Professor Tony Dear provides an overview of the course. You will also view guidelines to support your learning journey towards modeling sequential decision problems and implementing reinforcement learning algorithms.

What's included

6 videos6 readings1 assignment1 programming assignment3 discussion prompts1 plugin

6 videos•Total 39 minutes

Introduction to Decision Making and Reinforcement Learning•2 minutes
Course Logistics•3 minutes
1.1 Rational Agents and Utility Theory•9 minutes
1.2 Preferences and Axioms of Utility Theory•9 minutes
1.3 Uncertain and Multi-Attribute Utilities•10 minutes
1.4 Value of Perfect Information•7 minutes

6 readings•Total 60 minutes

Course Syllabus•10 minutes
About the Instructor•10 minutes
Academic Honesty Policy•10 minutes
Discussion Forum Etiquette•10 minutes
Pre-Course Survey •10 minutes
Week 1 Lesson Materials•10 minutes

1 assignment•Total 30 minutes

Utility Theory•30 minutes

1 programming assignment•Total 180 minutes

Utility Theory•180 minutes

3 discussion prompts•Total 30 minutes

Introduce Yourself!•10 minutes
Discussion on Utility Theory•10 minutes
Week 1 Questions and Feedback•10 minutes

1 plugin•Total 15 minutes

Pre-Course Survey •15 minutes

Welcome to week 2! This week, we will learn about multi-armed bandit problems, a type of optimization problem in which the algorithm balances exploration and exploitation to maximize rewards. Topics include action values and sample averaging estimation, 𝜀-greedy action selection, and the upper confidence bound. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

3 videos1 reading1 assignment1 programming assignment2 discussion prompts

3 videos•Total 36 minutes

2.1 Multi-Armed Bandits and Action Values•9 minutes
2.2 Ɛ-Greedy Action Selection•13 minutes
2.3 Upper Confidence Bound•14 minutes

1 reading•Total 10 minutes

Week 2 Lesson Materials•10 minutes

1 assignment•Total 30 minutes

Multi-Armed Bandit Problems•30 minutes

1 programming assignment•Total 180 minutes

Multi-Armed Bandit Problems•180 minutes

2 discussion prompts•Total 20 minutes

Discussion on Multi-Armed Bandits•10 minutes
Week 2 Questions and Feedback•10 minutes

Welcome to week 3! This week, we will focus on the basics of the Markov decision process, including rewards, utilities, discounting, policies, value functions, and Bellman equations. You will model sequential decision problems, understand the impact of rewards and discount factors on outcomes, define policies and value functions, and write Bellman equations for optimal solutions. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos1 reading1 assignment1 programming assignment3 discussion prompts

6 videos•Total 36 minutes

3.1 Markov Decision Process Framework•4 minutes
3.2 Gridworld Example•8 minutes
3.3 Rewards, Utilities, and Discounting•7 minutes
3.4 Policies and Value Functions•6 minutes
3.5 Example: Mini-Gridworld•5 minutes
3.6 Bellman Optimality Equations•4 minutes

1 reading•Total 10 minutes

Week 3 Lesson Materials•10 minutes

1 assignment•Total 30 minutes

Sequential Decision Problems•30 minutes

1 programming assignment•Total 180 minutes

Bellman Equations•180 minutes

3 discussion prompts•Total 30 minutes

Discussion on Sequential Decision Problem - Part 1•10 minutes
Discussion on Sequential Decision Problem - Part 2•10 minutes
Week 3 Questions and Feedback•10 minutes

Welcome to week 4! This week, we will cover dynamic programming algorithms for solving Markov decision processes (MDPs). Topics include value iteration and policy iteration, nonlinear Bellman equations, complexity and convergence, and a comparison of the two approaches.You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos1 reading1 assignment2 programming assignments3 discussion prompts

6 videos•Total 42 minutes

4.1 Time-Limited Values•8 minutes
4.2 Value Iteration•7 minutes
4.3 Value Iteration Implementation•8 minutes
4.4 Policy Iteration•9 minutes
4.5 Example: Mini-Gridworld•4 minutes
4.6 Algorithm Complexity•7 minutes

1 reading•Total 10 minutes

Week 4 Lesson Materials•10 minutes

1 assignment•Total 30 minutes

Markov Decision Processes•30 minutes

2 programming assignments•Total 360 minutes

Value Iteration•180 minutes
Policy Iteration•180 minutes

3 discussion prompts•Total 35 minutes

Discussion on Markov Decision Processes•15 minutes
Discussion on Policy Iteration vs. Value Iteration•10 minutes
Week 4 Questions and Feedback•10 minutes

Welcome to week 5! This week, we will go through topics on partial observability and POMDPs, belief states, representation as belief MDPs, and online planning in MDPs and POMDPs. You will also apply your knowledge to update the belief state and employ a belief transition function to calculate state values. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

5 videos2 readings1 assignment1 programming assignment3 discussion prompts

5 videos•Total 35 minutes

5.1 Partial Observability and POMDP •5 minutes
5.2 Belief States•9 minutes
5.3 Belief Transition Model•7 minutes
5.4 Policies and Value Functions•10 minutes
5.5 Example: Mini-Gridworld•5 minutes

2 readings•Total 20 minutes

Week 5 Lesson Materials•10 minutes
Summary of Weeks 3, 4, and 5•10 minutes

1 assignment•Total 30 minutes

POMDPs•30 minutes

1 programming assignment•Total 180 minutes

POMDPs•180 minutes

3 discussion prompts•Total 35 minutes

Discussion on POMDPs - Part 1•15 minutes
Discussion on POMDPs - Part 2•10 minutes
Week 5 Questions and Feedback•10 minutes

Welcome to week 6! This week, we will introduce Monte Carlo methods, and cover topics related to state value estimation using sample averaging and Monte Carlo prediction, state-action values and epsilon-greedy policies, and importance sampling for off-policy vs on-policy Monte Carlo control. You will learn to estimate state values, state-action values, use importance sampling, and implement off-policy Monte Carlo control for optimal policy learning. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos2 readings1 assignment1 programming assignment2 discussion prompts

6 videos•Total 42 minutes

6.1 Monte Carlo Methods•5 minutes
6.2 First-Visit MC Prediction•7 minutes
6.3 State-Action Values•5 minutes
6.4 Ɛ−Greedy On-Policy MC Control•8 minutes
6.5 On and Off-Policy MC Control•7 minutes
6.6 Example: Mini-Gridworld•9 minutes

2 readings•Total 20 minutes

Week 6 Lesson Materials•10 minutes
Post-Lecture Reading•10 minutes

1 assignment•Total 30 minutes

Monte Carlo RL•30 minutes

1 programming assignment•Total 180 minutes

Monte Carlo•180 minutes

2 discussion prompts•Total 20 minutes

Discussion on Monte Carlo RL•10 minutes
Week 6 Questions and Feedback•10 minutes

Welcome to week 7! This week, we will cover topics related to temporal difference learning for prediction, TD batch methods, SARSA for on-policy control, and Q-learning for off-policy control. You will learn to implement TD prediction, TD batch and offline methods, SARSA and Q-learning, and compare on-policy vs off-policy TD learning. You will then apply your knowledge in solving a Tic-tac-toe programming assignment.You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

5 videos2 readings1 assignment3 programming assignments2 discussion prompts

5 videos•Total 35 minutes

7.1 Temporal Difference Learning•7 minutes
7.2 Temporal Difference Prediction•6 minutes
7.3 Batch Updating•5 minutes
7.4 TD Learning for Control•8 minutes
7.5 SARSA vs Q-Learning•9 minutes

2 readings•Total 20 minutes

Week 7 Lesson Materials•10 minutes
Post-Lecture Readings•10 minutes

1 assignment•Total 30 minutes

Temporal Difference Learning•30 minutes

3 programming assignments•Total 420 minutes

Tic-Tac-Toe•60 minutes
Q-Learning•180 minutes
SARSA•180 minutes

2 discussion prompts•Total 20 minutes

Discussion on Temporal Difference RL•10 minutes
Week 7 Questions and Feedback•10 minutes

Welcome to week 8! This module covers n-step temporal difference prediction, n-step SARSA (on-policy and off-policy), model-based RL with Dyna-Q, and function approximation. You will be prepared to implement n-step TD learning, n-step SARSA, Dyna-Q for model-based learning, and use function approximation for reinforcement learning. You will apply your knowledge in the Frozen Lake programming environment. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

4 videos3 readings1 assignment1 programming assignment2 discussion prompts1 plugin

4 videos•Total 39 minutes

8.1 𝑛-step Temporal Difference Prediction•11 minutes
8.2 𝑛-step SARSA•9 minutes
8.3 Model-Based Methods•8 minutes
8.4 Function Approximation•12 minutes

3 readings•Total 30 minutes

Week 8 Lesson Materials•10 minutes
Post-Lecture Readings•10 minutes
Post-Course Survey•10 minutes

1 assignment•Total 30 minutes

Generalization of Tabular Methods•30 minutes

1 programming assignment•Total 180 minutes

Frozen Lake•180 minutes

2 discussion prompts•Total 25 minutes

Reinforcement Learning in Daily Lives•15 minutes
Week 8 Questions and Feedback•10 minutes

1 plugin•Total 15 minutes

Post-Course Survey•15 minutes

Instructor

Instructor ratings

4.3 (6 ratings)

👁 Tony Dear

Tony Dear

Columbia University

1 Course•4,652 learners

Offered by

👁 Image

Columbia University

Explore more from Algorithms

👁 Image
Status: Free Trial
U
University of Alberta
Fundamentals of Reinforcement Learning
Course
👁 Image
Status: Free Trial
N
New York University
Reinforcement Learning in Finance
Course
👁 Image
Status: Preview
S
Simplilearn
Fundamental of Reinforcement Training
Course
👁 Image
Status: Free Trial
U
University of Alberta
Sample-based Learning Methods
Course

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
66.66%
4 stars
20.83%
3 stars
0%
2 stars
8.33%
1 star
4.16%

Showing 3 of 24

Reviewed on Jan 20, 2024

Very good introductory and basic to Reinforcement Learning. But programming assignments need more careful compilation and more attention to detail!

Reviewed on Jul 9, 2023

Well-structured course that provides a great introduction to methodologies used in reinforcement learning. I am now eager to experiment more in my own time, to consolidate what I have learned.

View more reviews

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

URL: https://www.coursera.org/learn/dmrol