Fundamentals of Reinforcement Learning
Ends soon! Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Fundamentals of Reinforcement Learning
This course is part of Reinforcement Learning Specialization
Instructors: Martha White
110,054 already enrolled
Included with
Learn more
2,903 reviews
Recommended experience
2,903 reviews
Recommended experience
What you'll learn
Formalize problems as Markov Decision Processes
Understand basic exploration methods and the exploration / exploitation tradeoff
Understand value functions, as a general-purpose tool for optimal decision-making
Know how to implement dynamic programming as an efficient solution approach to an industrial control problem
Details to know
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 5 modules in this course
Reinforcement Learning is a subfield of Machine Learning, but is also a general purpose formalism for automated decision-making and AI. This course introduces you to statistical learning techniques where an agent explicitly takes actions and interacts with the world. Understanding the importance and challenges of learning agents that make decisions is of vital importance today, with more and more companies interested in interactive agents and intelligent decision-making.
This course introduces you to the fundamentals of Reinforcement Learning. When you finish this course, you will: - Formalize problems as Markov Decision Processes - Understand basic exploration methods and the exploration/exploitation tradeoff - Understand value functions, as a general-purpose tool for optimal decision-making - Know how to implement dynamic programming as an efficient solution approach to an industrial control problem This course teaches you the key concepts of Reinforcement Learning, underlying classic and modern algorithms in RL. After completing this course, you will be able to start using RL for real problems, where you have or can specify the MDP. This is the first course of the Reinforcement Learning Specialization.
Welcome to: Fundamentals of Reinforcement Learning, the first course in a four-part specialization on Reinforcement Learning brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, get a flavour of what the course has in store for you, and be given an in-depth roadmap to help make your journey through this specialization as smooth as possible.
What's included
4 videos2 readings1 discussion prompt
4 videosβ’Total 20 minutes
- Specialization Introductionβ’3 minutes
- Course Introductionβ’6 minutes
- Meet your instructors!β’8 minutes
- Your Specialization Roadmapβ’3 minutes
2 readingsβ’Total 20 minutes
- Reinforcement Learning Textbookβ’10 minutes
- Read Me: Pre-requisites and Learning Objectivesβ’10 minutes
1 discussion promptβ’Total 10 minutes
- Meet and Greet!β’10 minutes
For the first week of this course, you will learn how to understand the exploration-exploitation trade-off in sequential decision-making, implement incremental algorithms for estimating action-values, and compare the strengths and weaknesses to different algorithms for exploration. For this weekβs graded assessment, you will implement and test an epsilon-greedy agent.
What's included
8 videos3 readings1 assignment1 programming assignment1 discussion prompt2 plugins
8 videosβ’Total 46 minutes
- Sequential Decision Making with Evaluative Feedbackβ’6 minutes
- Learning Action Valuesβ’5 minutes
- Estimating Action Values Incrementallyβ’5 minutes
- What is the trade-off?β’8 minutes
- Optimistic Initial Valuesβ’6 minutes
- Upper-Confidence Bound (UCB) Action Selectionβ’5 minutes
- Jonathan Langford: Contextual Bandits for Real World Reinforcement Learningβ’9 minutes
- Week 1 Summaryβ’3 minutes
3 readingsβ’Total 70 minutes
- Module 1 Learning Objectivesβ’10 minutes
- Weekly Readingβ’30 minutes
- Chapter Summaryβ’30 minutes
1 assignmentβ’Total 45 minutes
- Sequential Decision-Makingβ’45 minutes
1 programming assignmentβ’Total 30 minutes
- Bandits and Exploration/Exploitationβ’30 minutes
1 discussion promptβ’Total 10 minutes
- Compare bandits to supervised learningβ’10 minutes
2 pluginsβ’Total 30 minutes
- Let's play a game!β’15 minutes
- What's underneath?β’15 minutes
When youβre presented with a problem in industry, the first and most important step is to translate that problem into a Markov Decision Process (MDP). The quality of your solution depends heavily on how well you do this translation. This week, you will learn the definition of MDPs, you will understand goal-directed behavior and how this can be obtained from maximizing scalar rewards, and you will also understand the difference between episodic and continuing tasks. For this weekβs graded assessment, you will create three example tasks of your own that fit into the MDP framework.
What's included
7 videos2 readings1 assignment1 peer review1 discussion prompt
7 videosβ’Total 36 minutes
- Markov Decision Processesβ’7 minutes
- Examples of MDPsβ’4 minutes
- The Goal of Reinforcement Learningβ’3 minutes
- Michael Littman: The Reward Hypothesisβ’12 minutes
- Continuing Tasksβ’5 minutes
- Examples of Episodic and Continuing Tasksβ’3 minutes
- Week 2 Summaryβ’2 minutes
2 readingsβ’Total 40 minutes
- Module 2 Learning Objectivesβ’10 minutes
- Weekly Readingβ’30 minutes
1 assignmentβ’Total 45 minutes
- MDPsβ’45 minutes
1 peer reviewβ’Total 60 minutes
- Graded Assignment: Describe Three MDPsβ’60 minutes
1 discussion promptβ’Total 10 minutes
- Is the reward hypothesis sufficient?β’10 minutes
Once the problem is formulated as an MDP, finding the optimal policy is more efficient when using value functions. This week, you will learn the definition of policies and value functions, as well as Bellman equations, which is the key technology that all of our algorithms will use.
What's included
9 videos3 readings2 assignments1 discussion prompt
9 videosβ’Total 56 minutes
- Specifying Policiesβ’5 minutes
- Value Functionsβ’6 minutes
- Rich Sutton and Andy Barto: A brief History of RLβ’8 minutes
- Bellman Equation Derivationβ’6 minutes
- Why Bellman Equations?β’5 minutes
- Optimal Policiesβ’8 minutes
- Optimal Value Functionsβ’5 minutes
- Using Optimal Value Functions to Get Optimal Policiesβ’8 minutes
- Week 3 Summaryβ’4 minutes
3 readingsβ’Total 53 minutes
- Module 3 Learning Objectivesβ’10 minutes
- Weekly Readingβ’30 minutes
- Chapter Summaryβ’13 minutes
2 assignmentsβ’Total 90 minutes
- [Practice] Value Functions and Bellman Equationsβ’45 minutes
- [Graded] Value Functions and Bellman Equationsβ’45 minutes
1 discussion promptβ’Total 10 minutes
- Check-inβ’10 minutes
This week, you will learn how to compute value functions and optimal policies, assuming you have the MDP model. You will implement dynamic programming to compute value functions and optimal policies and understand the utility of dynamic programming for industrial applications and problems. Further, you will learn about Generalized Policy Iteration as a common template for constructing algorithms that maximize reward. For this weekβs graded assessment, you will implement an efficient dynamic programming agent in a simulated industrial control problem.
What's included
10 videos3 readings1 assignment1 programming assignment1 discussion prompt
10 videosβ’Total 72 minutes
- Policy Evaluation vs. Controlβ’5 minutes
- Iterative Policy Evaluationβ’9 minutes
- Policy Improvementβ’4 minutes
- Policy Iterationβ’8 minutes
- Flexibility of the Policy Iteration Frameworkβ’4 minutes
- Efficiency of Dynamic Programmingβ’5 minutes
- Warren Powell: Approximate Dynamic Programming for Fleet Management (Short)β’8 minutes
- Warren Powell: Approximate Dynamic Programming for Fleet Management (Long)β’22 minutes
- Week 4 Summaryβ’3 minutes
- Congratulations!β’4 minutes
3 readingsβ’Total 70 minutes
- Module 4 Learning Objectivesβ’10 minutes
- Weekly Readingβ’30 minutes
- Chapter Summaryβ’30 minutes
1 assignmentβ’Total 45 minutes
- Dynamic Programmingβ’45 minutes
1 programming assignmentβ’Total 30 minutes
- Optimal Policies with Dynamic Programmingβ’30 minutes
1 discussion promptβ’Total 10 minutes
- Where can you use dynamic programming?β’10 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructors
Explore more from Machine Learning
- Status: PreviewS
Simplilearn
Course
- Status: Free TrialN
New York University
Course
- Status: PreviewC
Columbia University
Course
- Status: Free Trial
Why people choose Coursera for their career
Learner reviews
- 5 stars
81.78%
- 4 stars
14.29%
- 3 stars
2.61%
- 2 stars
0.44%
- 1 star
0.86%
Showing 3 of 2903
Reviewed on May 6, 2023
Excellent course, with a very nice presentation style, both the professors are excellent in their presentations and the material is well researched and delivered. A very valuable course.
Reviewed on Jan 2, 2021
The book is essential reading. It took me longer than the estimates to do the reading and the programming assignments. I would have liked more gridworld examples to get a faster hang of it.
Reviewed on Sep 1, 2019
All the concepts were well explained and this course was perhaps the best I have found for RL.Great efforts have been put into making the course and It goes well in line with the suggested textbook.
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you canβt afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, youβll find a link to apply on the description page.
More questions
Financial aid available,
ΒΉ Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.
