VOOZH about

URL: https://www.coursera.org/learn/deep-learning-for-ai-part-2

⇱ Deep Learning for AI Part 2 | Coursera


Deep Learning for AI Part 2

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

Deep Learning for AI Part 2

Included with

Ask Coursera

Gain insight into a topic and learn the fundamentals.
Intermediate level
Some related experience required
2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

Gain insight into a topic and learn the fundamentals.
Intermediate level
Some related experience required
2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

June 2026

Assessments

13 assignments

Taught in English

There are 7 modules in this course

This is Part 2 of a two-part graduate sequence in deep learning. Building on the foundations from Part 1, it focuses on advanced generative modeling. You will study autoregressive models, diffusion models, energy-based models, and normalizing flows; see how these techniques converge in multimodal text-to-image systems such as CLIP, DALL-E 2, Imagen, and Stable Diffusion; and apply generative methods to creative domains such as music generation. The course concludes by synthesizing the full arc—from discriminative foundations to advanced generative AI—and examining the ethical and societal implications of deploying these systems.

Autoregressive models are built on a deceptively simple principle: the joint probability of a sequence is the product of conditional probabilities of each element given all preceding elements. You will see this chain-rule factorization applied across three concrete systems—an LSTM recipe generator, PixelCNN for image synthesis, and the path from GPT to ChatGPT through reinforcement learning from human feedback.

What's included

13 readings2 assignments

13 readingsTotal 89 minutes
  • Course Introduction2 minutes
  • Syllabus - Deep Learning for AI Part 10 minutes
  • Meet Your Faculty1 minute
  • Academic Integrity1 minute
  • The Autoregressive Principle and Chain-Rule Factorization10 minutes
  • Key Characteristics and Applications5 minutes
  • LSTM Recipe Generator and the Epicurious Dataset10 minutes
  • Temperature Sampling and Generation Examples10 minutes
  • Masked Convolution and the PixelCNN Architecture10 minutes
  • Mask A vs. Mask B and Row-Wise Pixel Ordering5 minutes
  • PixelCNN on Fashion MNIST: Results and Analysis10 minutes
  • From GPT to ChatGPT: The RLHF Training Pipeline10 minutes
  • Instruction Tuning and Multi-Turn Dialogue5 minutes
2 assignmentsTotal 60 minutes
  • Assess Your Learning: The Autoregressive Principle and LSTM Text Generation30 minutes
  • Assess Your Learning: PixelCNN and GPT to ChatGPT30 minutes

Diffusion models have become the dominant paradigm for high-quality image generation, powering DALL-E, Imagen, and Stable Diffusion—systems you will encounter later in this course. You will work through the full framework: forward diffusion as a Markov chain, the closed-form noise schedule, the DDPM reverse process, and the U-Net architecture used for denoising.

What's included

1 video10 readings3 assignments

1 videoTotal 3 minutes
  • Reverse Diffusion and DDPM Training3 minutes
10 readingsTotal 135 minutes
  • Diffusion Models: Motivation and Advantages10 minutes
  • Comparing Diffusion Models to GANs and VAEs10 minutes
  • The Forward Diffusion Process and Gaussian Noise30 minutes
  • Closed-Form Forward Process and Noise Schedules30 minutes
  • The Reverse Diffusion Process and Denoising10 minutes
  • The DDPM Training Objective10 minutes
  • U-Net Architecture: Sinusoidal Time Embeddings and Residual Blocks10 minutes
  • Downsampling and Upsampling Paths in the U-Net5 minutes
  • Generating Oxford Flowers with a Diffusion Model10 minutes
  • Spherical Interpolation and Generated Results10 minutes
3 assignmentsTotal 90 minutes
  • Assess Your Learning: What Are Diffusion Models and the Forward Process30 minutes
  • Assess Your Learning: Reverse Diffusion and U-Net Architecture30 minutes
  • Assess Your Learning: Diffusion Model Flower Generation Example30 minutes

Energy-Based Models offer a unified probabilistic framework rooted in statistical physics: assign a scalar energy to every configuration of variables, with low energy indicating high probability, and train a neural network to shape that landscape. You will study Langevin dynamics and contrastive divergence as approaches to training under intractable normalization, and see the framework applied to image generation.

What's included

1 video6 readings2 assignments

1 videoTotal 6 minutes
  • An EBM Example6 minutes
6 readingsTotal 95 minutes
  • The Boltzmann Distribution and Maxwell-Boltzmann10 minutes
  • EBM Architecture Diagrams: RBM and DBN Structure5 minutes
  • EBM Definition, Advantages, Applications, and Neural Energy Functions30 minutes
  • Langevin Dynamics and MCMC Sampling10 minutes
  • Contrastive Divergence and the Replay Buffer10 minutes
  • EBM Training Results on Fashion MNIST30 minutes
2 assignmentsTotal 60 minutes
  • Assess Your Learning: Boltzmann Distribution and Energy-Based Models30 minutes
  • Assess Your Learning: Training EBMs and the Fashion MNIST Example30 minutes

Normalizing flows complete the generative model taxonomy introduced earlier in this course. Unlike VAEs—which optimize a variational lower bound—or GANs—which use implicit density estimation—flows enable exact likelihood computation through invertible mappings between the data distribution and a simple base distribution. You will work through the change-of-variables formula, Jacobian determinants, and the RealNVP architecture, with GLOW and FFJORD surveyed as key extensions.

What's included

9 readings3 assignments

9 readingsTotal 120 minutes
  • What Are Normalizing Flows? Positioning in the Generative Landscape5 minutes
  • The Change-of-Variables Formula and Invertible Transformations30 minutes
  • The Jacobian Determinant and Its Role in Likelihood10 minutes
  • Composing Transformations to Build a Generative Model10 minutes
  • Training Objectives and Density Estimation10 minutes
  • RealNVP Architecture: Affine Coupling Layers30 minutes
  • Alternating Binary Masks and Stacked Coupling Layers5 minutes
  • RealNVP on Two-Moons and Density Estimation Results10 minutes
  • GLOW and FFJORD: Extensions of Normalizing Flows10 minutes
3 assignmentsTotal 90 minutes
  • Assess Your Learning: Change of Variables and Jacobian Determinants30 minutes
  • Assess Your Learning: Building a Generative Flow and RealNVP30 minutes
  • Assess Your Learning: RealNVP Example, GLOW, and FFJORD30 minutes

Multimodal models process and generate across more than one modality—text, images, audio, video—and represent the current frontier of generative AI deployment. Everything you have studied in this course converges here: Transformer-based encoders, contrastive learning objectives, and diffusion decoders combine inside systems like DALL-E 2, Imagen, and Stable Diffusion, each of which you will examine in depth.

What's included

9 readings1 assignment

9 readingsTotal 120 minutes
  • Multimodal AI: Motivation and Real-World Applications15 minutes
  • Text-to-Image Generation Overview15 minutes
  • DALL-E 2 Architecture Overview and the Role of CLIP10 minutes
  • CLIP: Text and Image Encoders5 minutes
  • Contrastive Learning Objective and Pre-Training at Scale5 minutes
  • CLIP Image Prior: Autoregressive and Diffusion Approaches15 minutes
  • The Diffusion Decoder (GLIDE) and DALL-E 2 Applications30 minutes
  • Imagen: T5-XXL Language Encoder and Cascaded Diffusion10 minutes
  • Stable Diffusion: Latent Diffusion for High-Resolution Generation15 minutes
1 assignmentTotal 30 minutes
  • Assess Your Learning: Multimodal Learning and CLIP30 minutes

Music is a domain where the generative architectures you have studied throughout this course find an unexpectedly rich application—sequential like text, spatially structured like images, and polyphonic in ways that challenge single-stream models. You will explore how Transformer-based autoregressive models generate symbolic music token-by-token, and how MuseGAN extends adversarial training to multi-track polyphonic generation in piano-roll format.

What's included

7 readings2 assignments

7 readingsTotal 125 minutes
  • AI Music Generation: Approaches and Motivation5 minutes
  • Transformer Architecture for Music: The Two-Stream Approach10 minutes
  • MIDI and Piano-Roll Representation for Sequence Modeling10 minutes
  • Training the Music Transformer: Data, Vocabulary, and Objectives10 minutes
  • Temperature Sampling and Attention Heatmap Visualization30 minutes
  • MuseGAN Generator: Temporal Dynamics and Chord Progressions30 minutes
  • MuseGAN Critic and Multi-Track Piano-Roll Training30 minutes
2 assignmentsTotal 60 minutes
  • Assess Your Learning: Music Generation Intro and Transformers for Music30 minutes
  • Assess Your Learning: Monophonic Music Generation and MuseGAN30 minutes

There are no new technical lessons here—instead, you will synthesize the full arc of the course, from discriminative foundations through the generative landscape, and engage with the ethical dimensions of deploying these systems at scale: deepfakes, non-consensual generation, copyright, bias, and the governance challenges that accompany generative AI in the real world.

What's included

1 video4 readings

1 videoTotal 3 minutes
  • Course Reflections3 minutes
4 readingsTotal 60 minutes
  • Course Synthesis: From Discriminative to Generative AI30 minutes
  • Ethical Implications: Deepfakes, Non-Consensual Generation, and Copyright5 minutes
  • Responsible AI: Governance, Bias, and Societal Impact15 minutes
  • Congratulations! 10 minutes

Instructor

Northeastern University
8 Courses1,167 learners

Explore more from Machine Learning

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
👁 Image

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
👁 Image

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Financial aid available,