What You Need to Know Before
You Start

Starts 4 July 2025 16:24

Ends 4 July 2025

00 Days
00 Hours
00 Minutes
00 Seconds
course image

Data Selection - Data Challenges when Training Generative Models

Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube

Scalable Parallel Computing Lab, SPCL @ ETH Zurich

2777 Courses


1 hour

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Syllabus

  • Introduction to Data Selection in Generative Model Training
  • Importance of Data Selection
    Overview of Generative Models
  • Filtering Methods for Pre-training
  • Data Quality Assessment
    Data Deduplication Techniques
    Noise Reduction Strategies
  • Strategic Data Selection Techniques
  • Importance Sampling
    Submodular Optimization Approaches
    Active Learning for Data Curation
  • Optimal Transport Approaches for Fine-tuning
  • Principles of Optimal Transport
    Applications in Model Fine-tuning
    Case Studies in Reduced Data Requirements
  • Balancing Data Efficiency and Model Performance
  • Trade-offs in Data Selection
    Performance Metrics and Evaluation
  • Case Studies and Industry Applications
  • Real-world Examples
    Success Stories and Lessons Learned
  • Tools and Frameworks for Data Selection
  • Overview of Available Tools
    Practical Exercises and Tutorials
  • Future Trends and Research Directions
  • Emerging Techniques in Data Selection
    Opportunities for Innovation
  • Conclusion and Recap
  • Key Takeaways
    Final Thoughts on Data Selection for Generative Models
  • Practical Project
  • Design a Data Selection Pipeline
    Implement Filtering and Fine-tuning Strategies

Subjects

Computer Science