What You Need to Know Before
You Start

Starts 5 July 2026 13:10

Ends 5 July 2026

00 Days

00 Hours

00 Minutes

00 Seconds

Generative AI for Computer Vision

Explore Generative AI for computer vision, covering GANs, VAEs, diffusion models, transformers, and LLMs like GPT-4V for tasks like image captioning, VQA, and multimodal reasoning.

NPTEL via Swayam

Not Specified

Optional upgrade avallable

Advanced

Progress at your own speed

Free Online Course

Optional upgrade avallable

Overview

ABOUT THE COURSE:

This course explores how Generative AI is applied to modern computer vision tasks. Unlike existing NPTEL courses, it specifically emphasized on vision-based generative AI models.

It begins with mathematical foundations and classical vision techniques, followed by deep learning architectures. The course then introduces generative learning paradigms including GANs, VAEs, diffusion models, and transformers with a discussion regarding evaluation metrics and training challenges like mode collapse, diffusion noise scheduling, etc.

Moreover, it includes LLM models for vision applications like GPT-4V, LLaMA, PaLM-E, Flamingo, etc. This course is primarily focusing on deep generative learning for computer vision tasks like Image Captioning, VQA, Scene Understanding etc.

It further discusses multimodal generative models and agentic AI systems for automatic image synthesis and reasoning.INTENDED AUDIENCE:

Final/Pre-final year B.Tech/BE, M.Tech/ME, MS, PhD students, Industry professionals, and Faculty members.PREREQUISITES:

Basics of Machine Learning and Computer Vision. Neural Networks for Vision and NLP.INDUSTRY SUPPORT:

Relevant for AI/ML roles in IT companies, startups, research labs, and product-based companies working in generative AI and computer vision domains.

Syllabus

Introduction

Overview of Generative AI in Computer Vision

Course structure and objectives

Prerequisites review

Mathematical Foundations

Probability and statistics for generative models

Linear algebra and optimization techniques

Classical methods in computer vision

Classical Vision Techniques

Feature detection and extraction

Image filtering and transformation

Segmentation and object recognition

Deep Learning Architectures

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs) and LSTMs

Attention mechanisms and Transformers

Generative Learning Paradigms

Generative Adversarial Networks (GANs)

Architecture and loss functions

Mode collapse and evaluation metrics

Variational Autoencoders (VAEs)

Latent space representation

Regularization techniques

Diffusion Models

Noising and denoising processes

Noise scheduling methods

Multimodal Generative Models

Overview and significance

Agentic AI systems for image synthesis

Transformers and Vision Applications

Vision Transformers (ViT)

Large Language Models for vision

GPT-4V, LLaMA, PaLM-E, Flamingo

Image Captioning and Visual Question Answering (VQA)

Scene understanding and image synthesis

Training Challenges and Evaluation Metrics

Overfitting and underfitting remedies

Mode collapse in GANs

Evaluation aspects for generative models

Application Domains and Case Studies

Image Captioning and Visual Question Answering

Scene Understanding

Automatic Image Synthesis and Reasoning

Industry Use Cases and Open Research Areas

Real-world applications

Current trends in research and development

Course Conclusion

Summary of key learning outcomes

Future directions and career pathways in Generative AI for CV

Assignments and Project Work

Weekly exercises and coding assignments

End-of-course project: Implementing a vision-based generative AI model

Additional Resources

Recommended readings and research papers

Online forums and communities for further learning

Taught by

Prof. Arijit Sur

Subjects

Artificial Intelligence

What You Need to Know Before You Start

Generative AI for Computer Vision

Not Specified

Advanced

Free Online Course

Overview

Syllabus

Taught by

Subjects

Advancing Your Career in Production AI

Industrial Biomanufacturing: From Cells to Products

Automate Routine Tax Processes

Building Multimodal AI Agents

人工智能中的数学算法

Mathematical Algorithm in AI

What You Need to Know Before
You Start