VOOZH about

URL: https://www.coursera.org/learn/multi-modal-ai

⇱ Multi-modal AI | Coursera


Multi-modal AI

This course is part of AI Tooling Specialization

Included with

β€’

Learn more

Ask Coursera

Gain insight into a topic and learn the fundamentals.
Beginner level

Recommended experience

3 hours to complete
Flexible schedule
Learn at your own pace

Gain insight into a topic and learn the fundamentals.
Beginner level

Recommended experience

3 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • Apply multi-modal AI techniques to convert screenshots into working code using prompt engineering with visual context, GitHub Copilot

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

April 2026

Assessments

1 assignment

Taught in English

Build your subject-matter expertise

This course is part of the AI Tooling Specialization
When you enroll in this course, you'll also be enrolled in this Specialization.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 3 modules in this course

Learn to build production applications by combining visual and textual inputs with AI coding tools. You will explore multi-modal programming where screenshots, images, and text serve as inputs for AI-assisted code generation, and set up development environments configured for visual AI workflows. The course covers prompt engineering with visual context to improve code generation accuracy, and hands-on development with GitHub Copilot in VS Code for inline suggestions and chat-based interactions. You will build a complete project using live reload and browser developer tools for rapid feedback between AI generation and visual output. The iterative development module teaches documentation-driven design where documentation guides AI toward desired outcomes, image-based iteration for refining generated code through visual comparison, and automated checks and validations that maintain quality through development cycles. You will learn to identify and overcome common iteration challenges including regression and context drift. The advanced module covers Model Context Protocol for connecting AI tools with external capabilities, Playwright for browser automation and visual testing, and Playwright MCP for AI-driven browser interactions that validate web applications directly. By completing this course, you will be able to convert screenshots into production code through iterative, automated, multi-modal AI workflows.

Covers multi-modal, screenshots, overview, programming, and visual.

What's included

15 videos6 readings

15 videosβ€’Total 52 minutes
  • Course Introductionβ€’1 minute
  • What Is Multi-Modal Programmingβ€’3 minutes
  • Setting Up Multi-Modal Dev Environmentsβ€’6 minutes
  • Your First Screenshot to Code Conversionβ€’6 minutes
  • Lesson 1.1 Conclusionβ€’0 minutes
  • Prompt Engineering Introductionβ€’1 minute
  • Prompt Engineering with Visual Contextβ€’5 minutes
  • Introduction to GitHub Copilot and VS Codeβ€’5 minutes
  • Developing with GitHub Copilotβ€’5 minutes
  • Lesson 1.2 Conclusionβ€’1 minute
  • Building Introductionβ€’1 minute
  • What Will We Buildβ€’4 minutes
  • Live Reload and Developer Toolsβ€’7 minutes
  • Setting Up the Development Environmentβ€’7 minutes
  • Lesson 1.3 Conclusionβ€’1 minute
6 readingsβ€’Total 6 minutes
  • Key Termsβ€’1 minute
  • Reflectionβ€’1 minute
  • Key Termsβ€’1 minute
  • Reflectionβ€’1 minute
  • Key Termsβ€’1 minute
  • Reflectionβ€’1 minute

Covers iterative, documentation, iteration, designing, and context.

What's included

12 videos4 readings

12 videosβ€’Total 39 minutes
  • MCP and Automation Introductionβ€’1 minute
  • Introduction to MCPβ€’4 minutes
  • Overview of Playwrightβ€’4 minutes
  • Using Playwright MCPβ€’6 minutes
  • Overview of What We Builtβ€’3 minutes
  • Course Conclusionβ€’1 minute
  • Iterative Development Introductionβ€’1 minute
  • Designing with Documentationβ€’4 minutes
  • Iterating Over First Changesβ€’4 minutes
  • Using Images for Iterationβ€’6 minutes
  • Challenges with Iterationβ€’3 minutes
  • Automating Checks and Validationsβ€’4 minutes
4 readingsβ€’Total 40 minutes
  • Key Termsβ€’10 minutes
  • Reflection: MCP and Automationβ€’10 minutes
  • Key Termsβ€’10 minutes
  • Reflection: Iterative Developmentβ€’10 minutes

Build a web application using multi-modal AI development techniques, progressing from screenshot-to-code conversion through iterative refinement with visual feedback to automated browser testing with MCP and Playwright. The project demonstrates the complete multi-modal development lifecycle including prompt engineering with visual context, GitHub Copilot integration, and documentation-driven iteration.

What's included

3 readings1 assignment

3 readingsβ€’Total 21 minutes
  • Capstone Readingβ€’10 minutes
  • Next Stepsβ€’10 minutes
  • Before You Goβ€’1 minute
1 assignmentβ€’Total 15 minutes
  • Final Graded Quizβ€’15 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Pragmatic AI Labs
35 Coursesβ€’2,678 learners

Why people choose Coursera for their career

πŸ‘ Image

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
πŸ‘ Image

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
πŸ‘ Image

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
πŸ‘ Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Financial aid available,