מה צריך לדעת לפני
שתתחיל

מתחיל 17 July 2026 02:57

נגמר 17 July 2026

00 ימים

00 שעות

00 דקות

00 שניות

הרשמה

Generative AI for Audio and Images: Models and Applications

Explore advanced generative AI models like VAEs, GANs, Transformers, and Diffusion for creating audio, image, and video content with ethical considerations.

Alberta Machine Intelligence Institute via Coursera

1 day 5 hours 17 minutes

שדרוג אופציונלי זמין

מתקדם

התקדמות בקצב שלך

Paid Course

שדרוג אופציונלי זמין

סקירה כללית

Generative AI for Audio and Images:

Models and Applications offers an in-depth exploration of how modern generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, and Diffusion models are used to create, manipulate, and enhance audio, image, and video content. Learners examine the architectures, training processes, and use cases of these models across different modalities, gaining both conceptual understanding and practical insights through hands-on activities.

The course also highlights the ethical and societal implications of generative AI, including bias, transparency, intellectual property, and the challenges of deepfake technologies. By covering foundational theory as well as state-of-the-art approaches and applications, this course prepares learners to apply and develop generative AI creatively and responsibly for the audio and image modalities.

By the end of this course, learners will be able to:

Outline core concepts, challenges, and the history of AI-generated audio. Analyze important foundational audio generation models, such as variational and vector quantized autoencoders (VAE and VQ-VAE) Examine how these models integrate with the latest GenAI technologies to form hybrid, state-of-the-art transformer and diffusion-based audio generation systems, Study the architecture and functionality of Generative Adversarial Networks (GANs), and their variations.

Implement and train GAN models for creating and enhancing visual content, Explore cutting-edge techniques such as diffusion models and transformers for image and video creation. Discuss the ethical considerations regarding generative AI for audio and images.

סילבוס

The Fundamentals of AI-generated audio

This module introduces the foundations and core concepts of AI-generated audio. Learners explore why audio generation is uniquely challenging, such representation and evaluation challenges. They learn how audio is represented and processed, compare waveform and symbolic formats, and common audio data formats and Python libraries for working with audio. The module also examines methods for evaluating generated audio and provides a framework for categorizing audio generation approaches by their functionality and human–AI collaboration level. It concludes with a historical overview of AI-generated audio, tracing its evolution from early rule-based methods to modern deep generative models.

Advanced audio generation with Generative AI

Building on the fundamentals, this module dives into advanced models for audio generation. Learners study Variational Autoencoders (VAEs) and their variants, and how they apply to melody generation and speech synthesis. The module also explores transformer-based models, such as Music Transformer, AudioLM, and FastSpeech, as well as diffusion-based models like DiffWave and Stable Audio. Through these lessons, learners gain a comprehensive understanding of how modern generative architectures produce realistic, high-quality audio and music.

Introduction to Generative Image Models

This module transitions from audio to image generation, introducing the principles and evolution of image and video synthesis. Learners examine key architectures like GANs and VAEs, explore how adversarial training works, and study variations such as Conditional and Progressive GANs, Pix2Pix, and CycleGAN. The module also connects theory to practice by showcasing creative and commercial applications—from art and design to data augmentation—demonstrating how generative models enhance realism and variety in visual outputs.

Advanced Image and Video Generation with Generative AI

In this module,we explore the final stages of what large language models (LLMs) can offer. You’ll learn how and when to use fine-tuning, along with the pros and cons of different approaches. Throughout the course, you will receive relevant assignments that prepare you for the capstone project: building a fully functional chatbot

נלמד על ידי

Anahita Doosti and Soroush Razavi

נושאים

Artificial Intelligence

מה צריך לדעת לפני שתתחיל

Generative AI for Audio and Images: Models and Applications

1 day 5 hours 17 minutes

מתקדם

Paid Course

סקירה כללית

סילבוס

נלמד על ידי

נושאים

AI for FP&A Automation & Modeling

FP&A with AI: Capstone Project

Interpretability of LLMs - Generating SAE Feature Descriptions - Spring 2026

CodeCloak: A DRL-Based Method for Mitigating Code Leakage by LLM Code Assistants

Generative AI for NLP with PyTorch

Machine Learning Engineer: ML and Deep Learning Models

מה צריך לדעת לפני
שתתחיל