Deep Learning for AI Part 2
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Ask Coursera
Skills you'll gain
Tools you'll learn
Details to know
June 2026
13 assignments
See how employees at top companies are mastering in-demand skills
There are 7 modules in this course
This is Part 2 of a two-part graduate sequence in deep learning. Building on the foundations from Part 1, it focuses on advanced generative modeling. You will study autoregressive models, diffusion models, energy-based models, and normalizing flows; see how these techniques converge in multimodal text-to-image systems such as CLIP, DALL-E 2, Imagen, and Stable Diffusion; and apply generative methods to creative domains such as music generation. The course concludes by synthesizing the full arc—from discriminative foundations to advanced generative AI—and examining the ethical and societal implications of deploying these systems.
Autoregressive models are built on a deceptively simple principle: the joint probability of a sequence is the product of conditional probabilities of each element given all preceding elements. You will see this chain-rule factorization applied across three concrete systems—an LSTM recipe generator, PixelCNN for image synthesis, and the path from GPT to ChatGPT through reinforcement learning from human feedback.
What's included
13 readings2 assignments
13 readings•Total 89 minutes
- Course Introduction•2 minutes
- Syllabus - Deep Learning for AI Part •10 minutes
- Meet Your Faculty•1 minute
- Academic Integrity•1 minute
- The Autoregressive Principle and Chain-Rule Factorization•10 minutes
- Key Characteristics and Applications•5 minutes
- LSTM Recipe Generator and the Epicurious Dataset•10 minutes
- Temperature Sampling and Generation Examples•10 minutes
- Masked Convolution and the PixelCNN Architecture•10 minutes
- Mask A vs. Mask B and Row-Wise Pixel Ordering•5 minutes
- PixelCNN on Fashion MNIST: Results and Analysis•10 minutes
- From GPT to ChatGPT: The RLHF Training Pipeline•10 minutes
- Instruction Tuning and Multi-Turn Dialogue•5 minutes
2 assignments•Total 60 minutes
- Assess Your Learning: The Autoregressive Principle and LSTM Text Generation•30 minutes
- Assess Your Learning: PixelCNN and GPT to ChatGPT•30 minutes
Diffusion models have become the dominant paradigm for high-quality image generation, powering DALL-E, Imagen, and Stable Diffusion—systems you will encounter later in this course. You will work through the full framework: forward diffusion as a Markov chain, the closed-form noise schedule, the DDPM reverse process, and the U-Net architecture used for denoising.
What's included
1 video10 readings3 assignments
1 video•Total 3 minutes
- Reverse Diffusion and DDPM Training•3 minutes
10 readings•Total 135 minutes
- Diffusion Models: Motivation and Advantages•10 minutes
- Comparing Diffusion Models to GANs and VAEs•10 minutes
- The Forward Diffusion Process and Gaussian Noise•30 minutes
- Closed-Form Forward Process and Noise Schedules•30 minutes
- The Reverse Diffusion Process and Denoising•10 minutes
- The DDPM Training Objective•10 minutes
- U-Net Architecture: Sinusoidal Time Embeddings and Residual Blocks•10 minutes
- Downsampling and Upsampling Paths in the U-Net•5 minutes
- Generating Oxford Flowers with a Diffusion Model•10 minutes
- Spherical Interpolation and Generated Results•10 minutes
3 assignments•Total 90 minutes
- Assess Your Learning: What Are Diffusion Models and the Forward Process•30 minutes
- Assess Your Learning: Reverse Diffusion and U-Net Architecture•30 minutes
- Assess Your Learning: Diffusion Model Flower Generation Example•30 minutes
Energy-Based Models offer a unified probabilistic framework rooted in statistical physics: assign a scalar energy to every configuration of variables, with low energy indicating high probability, and train a neural network to shape that landscape. You will study Langevin dynamics and contrastive divergence as approaches to training under intractable normalization, and see the framework applied to image generation.
What's included
1 video6 readings2 assignments
1 video•Total 6 minutes
- An EBM Example•6 minutes
6 readings•Total 95 minutes
- The Boltzmann Distribution and Maxwell-Boltzmann•10 minutes
- EBM Architecture Diagrams: RBM and DBN Structure•5 minutes
- EBM Definition, Advantages, Applications, and Neural Energy Functions•30 minutes
- Langevin Dynamics and MCMC Sampling•10 minutes
- Contrastive Divergence and the Replay Buffer•10 minutes
- EBM Training Results on Fashion MNIST•30 minutes
2 assignments•Total 60 minutes
- Assess Your Learning: Boltzmann Distribution and Energy-Based Models•30 minutes
- Assess Your Learning: Training EBMs and the Fashion MNIST Example•30 minutes
Normalizing flows complete the generative model taxonomy introduced earlier in this course. Unlike VAEs—which optimize a variational lower bound—or GANs—which use implicit density estimation—flows enable exact likelihood computation through invertible mappings between the data distribution and a simple base distribution. You will work through the change-of-variables formula, Jacobian determinants, and the RealNVP architecture, with GLOW and FFJORD surveyed as key extensions.
What's included
9 readings3 assignments
9 readings•Total 120 minutes
- What Are Normalizing Flows? Positioning in the Generative Landscape•5 minutes
- The Change-of-Variables Formula and Invertible Transformations•30 minutes
- The Jacobian Determinant and Its Role in Likelihood•10 minutes
- Composing Transformations to Build a Generative Model•10 minutes
- Training Objectives and Density Estimation•10 minutes
- RealNVP Architecture: Affine Coupling Layers•30 minutes
- Alternating Binary Masks and Stacked Coupling Layers•5 minutes
- RealNVP on Two-Moons and Density Estimation Results•10 minutes
- GLOW and FFJORD: Extensions of Normalizing Flows•10 minutes
3 assignments•Total 90 minutes
- Assess Your Learning: Change of Variables and Jacobian Determinants•30 minutes
- Assess Your Learning: Building a Generative Flow and RealNVP•30 minutes
- Assess Your Learning: RealNVP Example, GLOW, and FFJORD•30 minutes
Multimodal models process and generate across more than one modality—text, images, audio, video—and represent the current frontier of generative AI deployment. Everything you have studied in this course converges here: Transformer-based encoders, contrastive learning objectives, and diffusion decoders combine inside systems like DALL-E 2, Imagen, and Stable Diffusion, each of which you will examine in depth.
What's included
9 readings1 assignment
9 readings•Total 120 minutes
- Multimodal AI: Motivation and Real-World Applications•15 minutes
- Text-to-Image Generation Overview•15 minutes
- DALL-E 2 Architecture Overview and the Role of CLIP•10 minutes
- CLIP: Text and Image Encoders•5 minutes
- Contrastive Learning Objective and Pre-Training at Scale•5 minutes
- CLIP Image Prior: Autoregressive and Diffusion Approaches•15 minutes
- The Diffusion Decoder (GLIDE) and DALL-E 2 Applications•30 minutes
- Imagen: T5-XXL Language Encoder and Cascaded Diffusion•10 minutes
- Stable Diffusion: Latent Diffusion for High-Resolution Generation•15 minutes
1 assignment•Total 30 minutes
- Assess Your Learning: Multimodal Learning and CLIP•30 minutes
Music is a domain where the generative architectures you have studied throughout this course find an unexpectedly rich application—sequential like text, spatially structured like images, and polyphonic in ways that challenge single-stream models. You will explore how Transformer-based autoregressive models generate symbolic music token-by-token, and how MuseGAN extends adversarial training to multi-track polyphonic generation in piano-roll format.
What's included
7 readings2 assignments
7 readings•Total 125 minutes
- AI Music Generation: Approaches and Motivation•5 minutes
- Transformer Architecture for Music: The Two-Stream Approach•10 minutes
- MIDI and Piano-Roll Representation for Sequence Modeling•10 minutes
- Training the Music Transformer: Data, Vocabulary, and Objectives•10 minutes
- Temperature Sampling and Attention Heatmap Visualization•30 minutes
- MuseGAN Generator: Temporal Dynamics and Chord Progressions•30 minutes
- MuseGAN Critic and Multi-Track Piano-Roll Training•30 minutes
2 assignments•Total 60 minutes
- Assess Your Learning: Music Generation Intro and Transformers for Music•30 minutes
- Assess Your Learning: Monophonic Music Generation and MuseGAN•30 minutes
There are no new technical lessons here—instead, you will synthesize the full arc of the course, from discriminative foundations through the generative landscape, and engage with the ethical dimensions of deploying these systems at scale: deepfakes, non-consensual generation, copyright, bias, and the governance challenges that accompany generative AI in the real world.
What's included
1 video4 readings
1 video•Total 3 minutes
- Course Reflections•3 minutes
4 readings•Total 60 minutes
- Course Synthesis: From Discriminative to Generative AI•30 minutes
- Ethical Implications: Deepfakes, Non-Consensual Generation, and Copyright•5 minutes
- Responsible AI: Governance, Bias, and Societal Impact•15 minutes
- Congratulations! •10 minutes
Instructor
Offered by
Explore more from Machine Learning
- Status: PreviewN
Northeastern University
Course
- Status: Free TrialP
Pearson
Course
- Status: Free Trial
Course
- Status: Free Trial
Course
Why people choose Coursera for their career
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
More questions
Financial aid available,
