Deep Learning for AI Part 2

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

👁 Northeastern University

Deep Learning for AI Part 2

👁 Xuemin Jin

Instructor: Xuemin Jin

Included with

•

Learn more

Ask Coursera

7 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Some related experience required

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

7 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Some related experience required

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

Skills you'll gain

Tools you'll learn

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 7 modules in this course

This is Part 2 of a two-part graduate sequence in deep learning. Building on the foundations from Part 1, it focuses on advanced generative modeling. You will study autoregressive models, diffusion models, energy-based models, and normalizing flows; see how these techniques converge in multimodal text-to-image systems such as CLIP, DALL-E 2, Imagen, and Stable Diffusion; and apply generative methods to creative domains such as music generation. The course concludes by synthesizing the full arc—from discriminative foundations to advanced generative AI—and examining the ethical and societal implications of deploying these systems.

Autoregressive models are built on a deceptively simple principle: the joint probability of a sequence is the product of conditional probabilities of each element given all preceding elements. You will see this chain-rule factorization applied across three concrete systems—an LSTM recipe generator, PixelCNN for image synthesis, and the path from GPT to ChatGPT through reinforcement learning from human feedback.

What's included

13 readings2 assignments

13 readings•Total 89 minutes

Course Introduction•2 minutes
Syllabus - Deep Learning for AI Part •10 minutes
Meet Your Faculty•1 minute
Academic Integrity•1 minute
The Autoregressive Principle and Chain-Rule Factorization•10 minutes
Key Characteristics and Applications•5 minutes
LSTM Recipe Generator and the Epicurious Dataset•10 minutes
Temperature Sampling and Generation Examples•10 minutes
Masked Convolution and the PixelCNN Architecture•10 minutes
Mask A vs. Mask B and Row-Wise Pixel Ordering•5 minutes
PixelCNN on Fashion MNIST: Results and Analysis•10 minutes
From GPT to ChatGPT: The RLHF Training Pipeline•10 minutes
Instruction Tuning and Multi-Turn Dialogue•5 minutes

2 assignments•Total 60 minutes

Assess Your Learning: The Autoregressive Principle and LSTM Text Generation•30 minutes
Assess Your Learning: PixelCNN and GPT to ChatGPT•30 minutes

Diffusion models have become the dominant paradigm for high-quality image generation, powering DALL-E, Imagen, and Stable Diffusion—systems you will encounter later in this course. You will work through the full framework: forward diffusion as a Markov chain, the closed-form noise schedule, the DDPM reverse process, and the U-Net architecture used for denoising.

What's included

1 video10 readings3 assignments

1 video•Total 3 minutes

Reverse Diffusion and DDPM Training•3 minutes

10 readings•Total 135 minutes

Diffusion Models: Motivation and Advantages•10 minutes
Comparing Diffusion Models to GANs and VAEs•10 minutes
The Forward Diffusion Process and Gaussian Noise•30 minutes
Closed-Form Forward Process and Noise Schedules•30 minutes
The Reverse Diffusion Process and Denoising•10 minutes
The DDPM Training Objective•10 minutes
U-Net Architecture: Sinusoidal Time Embeddings and Residual Blocks•10 minutes
Downsampling and Upsampling Paths in the U-Net•5 minutes
Generating Oxford Flowers with a Diffusion Model•10 minutes
Spherical Interpolation and Generated Results•10 minutes

3 assignments•Total 90 minutes

Assess Your Learning: What Are Diffusion Models and the Forward Process•30 minutes
Assess Your Learning: Reverse Diffusion and U-Net Architecture•30 minutes
Assess Your Learning: Diffusion Model Flower Generation Example•30 minutes

Energy-Based Models offer a unified probabilistic framework rooted in statistical physics: assign a scalar energy to every configuration of variables, with low energy indicating high probability, and train a neural network to shape that landscape. You will study Langevin dynamics and contrastive divergence as approaches to training under intractable normalization, and see the framework applied to image generation.

What's included

1 video6 readings2 assignments

1 video•Total 6 minutes

An EBM Example•6 minutes

6 readings•Total 95 minutes

The Boltzmann Distribution and Maxwell-Boltzmann•10 minutes
EBM Architecture Diagrams: RBM and DBN Structure•5 minutes
EBM Definition, Advantages, Applications, and Neural Energy Functions•30 minutes
Langevin Dynamics and MCMC Sampling•10 minutes
Contrastive Divergence and the Replay Buffer•10 minutes
EBM Training Results on Fashion MNIST•30 minutes

2 assignments•Total 60 minutes

Assess Your Learning: Boltzmann Distribution and Energy-Based Models•30 minutes
Assess Your Learning: Training EBMs and the Fashion MNIST Example•30 minutes

Normalizing flows complete the generative model taxonomy introduced earlier in this course. Unlike VAEs—which optimize a variational lower bound—or GANs—which use implicit density estimation—flows enable exact likelihood computation through invertible mappings between the data distribution and a simple base distribution. You will work through the change-of-variables formula, Jacobian determinants, and the RealNVP architecture, with GLOW and FFJORD surveyed as key extensions.

What's included

9 readings3 assignments

9 readings•Total 120 minutes

What Are Normalizing Flows? Positioning in the Generative Landscape•5 minutes
The Change-of-Variables Formula and Invertible Transformations•30 minutes
The Jacobian Determinant and Its Role in Likelihood•10 minutes
Composing Transformations to Build a Generative Model•10 minutes
Training Objectives and Density Estimation•10 minutes
RealNVP Architecture: Affine Coupling Layers•30 minutes
Alternating Binary Masks and Stacked Coupling Layers•5 minutes
RealNVP on Two-Moons and Density Estimation Results•10 minutes
GLOW and FFJORD: Extensions of Normalizing Flows•10 minutes

3 assignments•Total 90 minutes

Assess Your Learning: Change of Variables and Jacobian Determinants•30 minutes
Assess Your Learning: Building a Generative Flow and RealNVP•30 minutes
Assess Your Learning: RealNVP Example, GLOW, and FFJORD•30 minutes

Multimodal models process and generate across more than one modality—text, images, audio, video—and represent the current frontier of generative AI deployment. Everything you have studied in this course converges here: Transformer-based encoders, contrastive learning objectives, and diffusion decoders combine inside systems like DALL-E 2, Imagen, and Stable Diffusion, each of which you will examine in depth.

What's included

9 readings1 assignment

9 readings•Total 120 minutes

Multimodal AI: Motivation and Real-World Applications•15 minutes
Text-to-Image Generation Overview•15 minutes
DALL-E 2 Architecture Overview and the Role of CLIP•10 minutes
CLIP: Text and Image Encoders•5 minutes
Contrastive Learning Objective and Pre-Training at Scale•5 minutes
CLIP Image Prior: Autoregressive and Diffusion Approaches•15 minutes
The Diffusion Decoder (GLIDE) and DALL-E 2 Applications•30 minutes
Imagen: T5-XXL Language Encoder and Cascaded Diffusion•10 minutes
Stable Diffusion: Latent Diffusion for High-Resolution Generation•15 minutes

1 assignment•Total 30 minutes

Assess Your Learning: Multimodal Learning and CLIP•30 minutes

Music is a domain where the generative architectures you have studied throughout this course find an unexpectedly rich application—sequential like text, spatially structured like images, and polyphonic in ways that challenge single-stream models. You will explore how Transformer-based autoregressive models generate symbolic music token-by-token, and how MuseGAN extends adversarial training to multi-track polyphonic generation in piano-roll format.

What's included

7 readings2 assignments

7 readings•Total 125 minutes

AI Music Generation: Approaches and Motivation•5 minutes
Transformer Architecture for Music: The Two-Stream Approach•10 minutes
MIDI and Piano-Roll Representation for Sequence Modeling•10 minutes
Training the Music Transformer: Data, Vocabulary, and Objectives•10 minutes
Temperature Sampling and Attention Heatmap Visualization•30 minutes
MuseGAN Generator: Temporal Dynamics and Chord Progressions•30 minutes
MuseGAN Critic and Multi-Track Piano-Roll Training•30 minutes

2 assignments•Total 60 minutes

Assess Your Learning: Music Generation Intro and Transformers for Music•30 minutes
Assess Your Learning: Monophonic Music Generation and MuseGAN•30 minutes

There are no new technical lessons here—instead, you will synthesize the full arc of the course, from discriminative foundations through the generative landscape, and engage with the ethical dimensions of deploying these systems at scale: deepfakes, non-consensual generation, copyright, bias, and the governance challenges that accompany generative AI in the real world.

What's included

1 video4 readings

1 video•Total 3 minutes

Course Reflections•3 minutes

4 readings•Total 60 minutes

Course Synthesis: From Discriminative to Generative AI•30 minutes
Ethical Implications: Deepfakes, Non-Consensual Generation, and Copyright•5 minutes
Responsible AI: Governance, Bias, and Societal Impact•15 minutes
Congratulations! •10 minutes

Instructor

👁 Xuemin Jin

Xuemin Jin

Northeastern University

8 Courses•1,167 learners

Offered by

👁 Image

Northeastern University

Explore more from Machine Learning

👁 Image
Status: Preview
N
Northeastern University
Deep Learning for AI Part 1
Course
👁 Image
Status: Free Trial
P
Pearson
Learning Deep Learning: Unit 2
Course
👁 Image
Status: Free Trial
P
Pearson
Programming Generative AI: Unit 3
Course
👁 Image
Status: Free Trial
P
Pearson
Programming Generative AI: Unit 2
Course

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

URL: https://www.coursera.org/learn/deep-learning-for-ai-part-2