What You Need to Know Before
You Start

Starts 7 June 2026 07:30

Ends 7 June 2026

00 Days
00 Hours
00 Minutes
00 Seconds
course image

Programming Generative AI: Unit 3

Master multimodal AI by exploring contrastive language-image pre-training, latent diffusion models, and text-to-image generation with hands-on fine-tuning techniques.
via Coursera

2889 Courses


8 hours 17 minutes

Optional upgrade avallable

Not Specified

Progress at your own speed

Paid Course

Optional upgrade avallable

Overview

Unlock the full potential of generative AI with our advanced course module focused on state-of-the-art multimodal models. This course is designed for learners eager to bridge the gap between images and text, and to master the latest techniques in AI-driven content generation.

You’ll begin by exploring the foundational concepts behind multimodal models, learning how contrastive language-image pre-training enables seamless integration of visual and textual data. Discover how these models power innovative applications like semantic image search, allowing you to query image content without manual labeling.

Dive deeper into the mechanics of latent diffusion models and unravel the inner workings of stable diffusion, gaining the skills to transform text prompts into entirely new, never-before-seen images. The course also covers essential strategies for evaluating generative models and introduces efficient methods for fine-tuning and adapting pre-trained models to new styles and subjects.

By the end, you’ll be equipped to build, adapt, and optimize cutting-edge text-to-image systems—ready to innovate in creative, research, or commercial settings.

Syllabus

  • Programming Generative AI: Unit 3
  • This module delves into multimodal generative AI, focusing on models that connect images and text. Learners explore contrastive language-image pre-training for semantic image search and uncover the workings of latent diffusion and stable diffusion for text-to-image generation. The module then covers evaluation of generative models, parameter-efficient fine-tuning, and techniques to teach pre-trained models new styles and subjects. It concludes with methods to optimize diffusion models for faster, near real-time image generation, equipping students with both conceptual understanding and practical skills in advanced multimodal AI systems.

Taught by

Pearson


Subjects

Computer Science