Generative AI Advanced Fine-Tuning for LLMs

👁 IBM

Generative AI Advanced Fine-Tuning for LLMs

This course is part of multiple programs.

👁 Joseph Santarcangelo

👁 Ashutosh Sagar

👁 Wojciech 'Victor' Fulmyk

Instructors: Joseph Santarcangelo

24,261 already enrolled

Included with

•

Learn more

Ask Coursera

2 modules

Gain insight into a topic and learn the fundamentals.

4.4

133 reviews

Intermediate level

Recommended experience

Flexible schedule

9 hours to complete

Learn at your own pace

88%

Most learners liked this course

2 modules

Gain insight into a topic and learn the fundamentals.

4.4

133 reviews

Intermediate level

Recommended experience

Flexible schedule

9 hours to complete

Learn at your own pace

88%

Most learners liked this course

What you'll learn

In-demand generative AI engineering skills in fine-tuning LLMs that employers are actively seeking
Instruction tuning and reward modeling using Hugging Face, plus understanding LLMs as policies and applying RLHF techniques
Direct preference optimization (DPO) with partition function and Hugging Face, including how to define optimal solutions to DPO problems
Using proximal policy optimization (PPO) with Hugging Face to build scoring functions and tokenize datasets for fine-tuning

Skills you'll gain

Tools you'll learn

Generative AI

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of

When you enroll in this course, you'll also be asked to select a specific program.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

👁 Image

There are 2 modules in this course

"Fine-tuning large language models (LLMs) is essential for aligning them with specific business needs, improving accuracy, and optimizing performance. In today’s AI-driven world, organizations rely on fine-tuned models to generate precise, actionable insights that drive innovation and efficiency. This course equips aspiring generative AI engineers with the in-demand skills employers are actively seeking.

You’ll explore advanced fine-tuning techniques for causal LLMs, including instruction tuning, reward modeling, and direct preference optimization. Learn how LLMs act as probabilistic policies for generating responses and how to align them with human preferences using tools such as Hugging Face. You’ll dive into reward calculation, reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO), the PPO trainer, and optimal strategies for direct preference optimization (DPO). The hands-on labs in the course will provide real-world experience with instruction tuning, reward modeling, PPO, and DPO, giving you the tools to confidently fine-tune LLMs for high-impact applications. Build job-ready generative AI skills in just two weeks! Enroll today and advance your career in AI!"

In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) through instruction tuning and reward modeling. You’ll begin by defining instruction tuning and learning its process, including dataset loading, text generation pipelines, and training arguments using Hugging Face. You’ll then delve into reward modeling, where you’ll preprocess datasets, apply low-rank adaptation (LoRA) configurations, and quantify quality responses to guide model optimization and align with human preferences. You’ll also describe and utilize reward trainers and reward model loss functions. In addition, the hands-on labs will reinforce your learning with practical experience in instruction tuning and reward modeling, empowering you to effectively customize LLMs for targeted tasks.

What's included

6 videos4 readings2 assignments2 app items3 plugins

6 videos•Total 36 minutes

Course Introduction•3 minutes
Basics of Instruction-Tuning•7 minutes
Instruction-Tuning with Hugging Face•7 minutes
Reward Modeling: Response Evaluation•5 minutes
Reward Model Training •7 minutes
Reward Modeling with Hugging Face•8 minutes

4 readings•Total 18 minutes

Course Overview•3 minutes
Specialization Overview•10 minutes
Best Practices for Instruction-Tuning Large Language Models •3 minutes
Summary and Highlights •2 minutes

2 assignments•Total 30 minutes

Different Approaches to Instruction-Tuning•21 minutes
Practice Quiz: Instruction-Tuning and Reward Modeling •9 minutes

2 app items•Total 150 minutes

Instruction Fine-Tuning LLMs•90 minutes
Lab: Reward Modeling•60 minutes

3 plugins•Total 35 minutes

Helpful tips for Course Completion•5 minutes
Instruction Tuning•15 minutes
Reward Modeling & Response Evaluation•15 minutes

In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) using reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO), and direct preference optimization (DPO). You’ll begin by describing how LLMs function as probabilistic distributions and how these can be transformed into policies to generate responses based on input text. You’ll examine the relationship between policies and language models as a function of parameters, such as omega, and how rewards can be calculated using human feedback. This includes training response samples, evaluating agent performance, and defining scoring functions for tasks like sentiment analysis using PPO. You’ll also be able to explain PPO configuration, learning rates, and the PPO trainer’s role in optimizing chatbot responses using Hugging Face tools. The module further introduces DPO, a more direct and efficient way to align models with human preferences. While complex topics like PPO and reinforcement learning are introduced, you are not expected to understand them in depth for this course. The hands-on labs in this module will allow you to practice applying RLHF and DPO. To support your learning, a cheat sheet and glossary are included for quick reference.

What's included

10 videos5 readings3 assignments2 app items4 plugins

10 videos•Total 59 minutes

Large Language Models (LLMs) as Distributions•7 minutes
From Distributions to Policies•4 minutes
Reinforcement Learning from Human Feedback (RLHF)•8 minutes
Proximal Policy Optimization (PPO)•5 minutes
PPO with Hugging Face•4 minutes
PPO Trainer•6 minutes
DPO: Partition Function•6 minutes
DPO: Optimal Solution•8 minutes
From Optimal Policy to DPO•6 minutes
DPO with Hugging Face•5 minutes

5 readings•Total 18 minutes

Summary and Highlights •4 minutes
Summary and Highlights•3 minutes
Course Conclusion•6 minutes
Congratulations and Next Steps•3 minutes
Thanks from the course team•2 minutes

3 assignments•Total 61 minutes

Fine-Tuning Causal LLMs with Human Feedback and Direct Preference•30 minutes
Practice Quiz: Proximal Policy Optimization (PPO)•21 minutes
Practice Quiz: Direct Preference Optimization (DPO)•10 minutes

2 app items•Total 75 minutes

Lab: Reinforcement Learning from Human Feedback using PPO•30 minutes
Lab: Direct Preference Optimization (DPO) using Hugging Face•45 minutes

4 plugins•Total 60 minutes

Log-derivative Trick•15 minutes
Fine-tune LLMs Locally with InstructLab•15 minutes
Cheat Sheet: Generative AI Advanced Fine-Tuning for LLMs•15 minutes
Glossary: Generative AI Advance Fine-Tuning for LLMs•15 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings

3.8 (16 ratings)

👁 Joseph Santarcangelo

Joseph Santarcangelo

IBM

37 Courses•2,497,133 learners

Offered by

👁 Image

IBM

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
75.18%
4 stars
8.27%
3 stars
3.75%
2 stars
4.51%
1 star
8.27%

Showing 3 of 133

Reviewed on Aug 20, 2025

An excellent course with a wealth of high-quality material, featuring highly informative lessons such as DPO and PPO.

Reviewed on Mar 10, 2025

Great course, love the deep-rooted content. All my concepts are so clear now. Kudos!!

Reviewed on Apr 29, 2026

Good course starts with origins of LLM and brings you up to date with DPO

View more reviews

👁 Image
Unlock access to 10,000+ courses with a subscription
👁 Image
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
👁 Image
👁 Image
Join over 4,700 global companies that choose Coursera for Business
👁 Image

Frequently asked questions

It takes about 3–5 hours to complete this course, so you can have the job-ready skills you need to impress an employer within just two weeks!

This course is intermediate level, so to get the most out of your learning, you must have basic knowledge of Python, large language models (LLMs), reinforcement learning, and instruction-tuning. You should also be familiar with machine learning and neural network concepts.

This course is part of the Generative AI Engineering with LLMs specialization. When you complete the specialization, you will have the skills and confidence to take on job roles such as AI engineer, data scientist, machine learning engineer, deep learning engineer, AI engineer, and developers seeking to work with LLMs.

Only a modern web browser is required to complete this course and all hands-on labs. You will be provided access to cloud-based environments to complete the labs at no charge.

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

URL: https://www.coursera.org/learn/generative-ai-advanced-fine-tuning-for-llms