Multi-modal AI
Multi-modal AI
This course is part of AI Tooling Specialization
Instructor: Alfredo Deza
Included with
Learn more
Ask Coursera
Recommended experience
Recommended experience
What you'll learn
Apply multi-modal AI techniques to convert screenshots into working code using prompt engineering with visual context, GitHub Copilot
Skills you'll gain
Details to know
April 2026
1 assignment
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 3 modules in this course
Learn to build production applications by combining visual and textual inputs with AI coding tools. You will explore multi-modal programming where screenshots, images, and text serve as inputs for AI-assisted code generation, and set up development environments configured for visual AI workflows. The course covers prompt engineering with visual context to improve code generation accuracy, and hands-on development with GitHub Copilot in VS Code for inline suggestions and chat-based interactions. You will build a complete project using live reload and browser developer tools for rapid feedback between AI generation and visual output. The iterative development module teaches documentation-driven design where documentation guides AI toward desired outcomes, image-based iteration for refining generated code through visual comparison, and automated checks and validations that maintain quality through development cycles. You will learn to identify and overcome common iteration challenges including regression and context drift. The advanced module covers Model Context Protocol for connecting AI tools with external capabilities, Playwright for browser automation and visual testing, and Playwright MCP for AI-driven browser interactions that validate web applications directly. By completing this course, you will be able to convert screenshots into production code through iterative, automated, multi-modal AI workflows.
Covers multi-modal, screenshots, overview, programming, and visual.
What's included
15 videos6 readings
15 videosβ’Total 52 minutes
- Course Introductionβ’1 minute
- What Is Multi-Modal Programmingβ’3 minutes
- Setting Up Multi-Modal Dev Environmentsβ’6 minutes
- Your First Screenshot to Code Conversionβ’6 minutes
- Lesson 1.1 Conclusionβ’0 minutes
- Prompt Engineering Introductionβ’1 minute
- Prompt Engineering with Visual Contextβ’5 minutes
- Introduction to GitHub Copilot and VS Codeβ’5 minutes
- Developing with GitHub Copilotβ’5 minutes
- Lesson 1.2 Conclusionβ’1 minute
- Building Introductionβ’1 minute
- What Will We Buildβ’4 minutes
- Live Reload and Developer Toolsβ’7 minutes
- Setting Up the Development Environmentβ’7 minutes
- Lesson 1.3 Conclusionβ’1 minute
6 readingsβ’Total 6 minutes
- Key Termsβ’1 minute
- Reflectionβ’1 minute
- Key Termsβ’1 minute
- Reflectionβ’1 minute
- Key Termsβ’1 minute
- Reflectionβ’1 minute
Covers iterative, documentation, iteration, designing, and context.
What's included
12 videos4 readings
12 videosβ’Total 39 minutes
- MCP and Automation Introductionβ’1 minute
- Introduction to MCPβ’4 minutes
- Overview of Playwrightβ’4 minutes
- Using Playwright MCPβ’6 minutes
- Overview of What We Builtβ’3 minutes
- Course Conclusionβ’1 minute
- Iterative Development Introductionβ’1 minute
- Designing with Documentationβ’4 minutes
- Iterating Over First Changesβ’4 minutes
- Using Images for Iterationβ’6 minutes
- Challenges with Iterationβ’3 minutes
- Automating Checks and Validationsβ’4 minutes
4 readingsβ’Total 40 minutes
- Key Termsβ’10 minutes
- Reflection: MCP and Automationβ’10 minutes
- Key Termsβ’10 minutes
- Reflection: Iterative Developmentβ’10 minutes
Build a web application using multi-modal AI development techniques, progressing from screenshot-to-code conversion through iterative refinement with visual feedback to automated browser testing with MCP and Playwright. The project demonstrates the complete multi-modal development lifecycle including prompt engineering with visual context, GitHub Copilot integration, and documentation-driven iteration.
What's included
3 readings1 assignment
3 readingsβ’Total 21 minutes
- Capstone Readingβ’10 minutes
- Next Stepsβ’10 minutes
- Before You Goβ’1 minute
1 assignmentβ’Total 15 minutes
- Final Graded Quizβ’15 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructor
Offered by
Explore more from Software Development
- P
Pragmatic AI Labs
Course
- P
Pragmatic AI Labs
Course
- P
Pragmatic AI Labs
Course
- P
Pragmatic AI Labs
Course
Why people choose Coursera for their career
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you canβt afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, youβll find a link to apply on the description page.
More questions
Financial aid available,
