What You Need to Know Before
You Start

Starts 4 June 2025 22:03

Ends 4 June 2025

00 days
00 hours
00 minutes
00 seconds
course image

Data Selection - Data Challenges when Training Generative Models

Explore strategic data selection techniques for efficient generative AI model training, covering filtering methods for pre-training and optimal transport approaches for fine-tuning that reduce data requirements while maintaining performance.
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube

Scalable Parallel Computing Lab, SPCL @ ETH Zurich

2458 Courses


1 hour

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Explore strategic data selection techniques for efficient generative AI model training, covering filtering methods for pre-training and optimal transport approaches for fine-tuning that reduce data requirements while maintaining performance.

Syllabus

  • Introduction to Data Selection in Generative Model Training
  • Importance of Data Selection
    Overview of Generative Models
  • Filtering Methods for Pre-training
  • Data Quality Assessment
    Data Deduplication Techniques
    Noise Reduction Strategies
  • Strategic Data Selection Techniques
  • Importance Sampling
    Submodular Optimization Approaches
    Active Learning for Data Curation
  • Optimal Transport Approaches for Fine-tuning
  • Principles of Optimal Transport
    Applications in Model Fine-tuning
    Case Studies in Reduced Data Requirements
  • Balancing Data Efficiency and Model Performance
  • Trade-offs in Data Selection
    Performance Metrics and Evaluation
  • Case Studies and Industry Applications
  • Real-world Examples
    Success Stories and Lessons Learned
  • Tools and Frameworks for Data Selection
  • Overview of Available Tools
    Practical Exercises and Tutorials
  • Future Trends and Research Directions
  • Emerging Techniques in Data Selection
    Opportunities for Innovation
  • Conclusion and Recap
  • Key Takeaways
    Final Thoughts on Data Selection for Generative Models
  • Practical Project
  • Design a Data Selection Pipeline
    Implement Filtering and Fine-tuning Strategies

Subjects

Computer Science