What You Need to Know Before
You Start

Starts 13 June 2026 08:46

Ends 13 June 2026

00 Days
00 Hours
00 Minutes
00 Seconds
course image

Generative AI for Computer Vision

Explore Generative AI for computer vision, covering GANs, VAEs, diffusion models, transformers, and LLMs like GPT-4V for tasks like image captioning, VQA, and multimodal reasoning.
NPTEL via Swayam

NPTEL

154 Courses


Not Specified

Optional upgrade avallable

Advanced

Progress at your own speed

Free Online Course

Optional upgrade avallable

Overview

ABOUT THE COURSE:

This course explores how Generative AI is applied to modern computer vision tasks. Unlike existing NPTEL courses, it specifically emphasized on vision-based generative AI models.

It begins with mathematical foundations and classical vision techniques, followed by deep learning architectures. The course then introduces generative learning paradigms including GANs, VAEs, diffusion models, and transformers with a discussion regarding evaluation metrics and training challenges like mode collapse, diffusion noise scheduling, etc.

Moreover, it includes LLM models for vision applications like GPT-4V, LLaMA, PaLM-E, Flamingo, etc. This course is primarily focusing on deep generative learning for computer vision tasks like Image Captioning, VQA, Scene Understanding etc.

It further discusses multimodal generative models and agentic AI systems for automatic image synthesis and reasoning.INTENDED AUDIENCE:

Final/Pre-final year B.Tech/BE, M.Tech/ME, MS, PhD students, Industry professionals, and Faculty members.PREREQUISITES:

Basics of Machine Learning and Computer Vision. Neural Networks for Vision and NLP.INDUSTRY SUPPORT:

Relevant for AI/ML roles in IT companies, startups, research labs, and product-based companies working in generative AI and computer vision domains.

Syllabus

  • Introduction
  • Overview of Generative AI in Computer Vision
    Course structure and objectives
    Prerequisites review
  • Mathematical Foundations
  • Probability and statistics for generative models
    Linear algebra and optimization techniques
    Classical methods in computer vision
  • Classical Vision Techniques
  • Feature detection and extraction
    Image filtering and transformation
    Segmentation and object recognition
  • Deep Learning Architectures
  • Convolutional Neural Networks (CNNs)
    Recurrent Neural Networks (RNNs) and LSTMs
    Attention mechanisms and Transformers
  • Generative Learning Paradigms
  • Generative Adversarial Networks (GANs)
    Architecture and loss functions
    Mode collapse and evaluation metrics
    Variational Autoencoders (VAEs)
    Latent space representation
    Regularization techniques
    Diffusion Models
    Noising and denoising processes
    Noise scheduling methods
  • Multimodal Generative Models
  • Overview and significance
    Agentic AI systems for image synthesis
  • Transformers and Vision Applications
  • Vision Transformers (ViT)
    Large Language Models for vision
    GPT-4V, LLaMA, PaLM-E, Flamingo
    Image Captioning and Visual Question Answering (VQA)
    Scene understanding and image synthesis
  • Training Challenges and Evaluation Metrics
  • Overfitting and underfitting remedies
    Mode collapse in GANs
    Evaluation aspects for generative models
  • Application Domains and Case Studies
  • Image Captioning and Visual Question Answering
    Scene Understanding
    Automatic Image Synthesis and Reasoning
  • Industry Use Cases and Open Research Areas
  • Real-world applications
    Current trends in research and development
  • Course Conclusion
  • Summary of key learning outcomes
    Future directions and career pathways in Generative AI for CV
  • Assignments and Project Work
  • Weekly exercises and coding assignments
    End-of-course project: Implementing a vision-based generative AI model
  • Additional Resources
  • Recommended readings and research papers
    Online forums and communities for further learning

Taught by

Prof. Arijit Sur


Subjects

Artificial Intelligence