Wat je moet weten voordat je
begint

Start 4 June 2026 12:46

Einde 4 June 2026

00 Dagen
00 Uren
00 Minuten
00 Seconden
course image

Data Selection - Data Challenges when Training Generative Models

Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube

Scalable Parallel Computing Lab, SPCL @ ETH Zurich

6076 Cursussen


1 hour

Optionele upgrade beschikbaar

Not Specified

Ga in je eigen tempo vooruit

Free Video

Optionele upgrade beschikbaar

Overzicht

Lesprogramma

  • Introduction to Data Selection in Generative Model Training
  • Importance of Data Selection
    Overview of Generative Models
  • Filtering Methods for Pre-training
  • Data Quality Assessment
    Data Deduplication Techniques
    Noise Reduction Strategies
  • Strategic Data Selection Techniques
  • Importance Sampling
    Submodular Optimization Approaches
    Active Learning for Data Curation
  • Optimal Transport Approaches for Fine-tuning
  • Principles of Optimal Transport
    Applications in Model Fine-tuning
    Case Studies in Reduced Data Requirements
  • Balancing Data Efficiency and Model Performance
  • Trade-offs in Data Selection
    Performance Metrics and Evaluation
  • Case Studies and Industry Applications
  • Real-world Examples
    Success Stories and Lessons Learned
  • Tools and Frameworks for Data Selection
  • Overview of Available Tools
    Practical Exercises and Tutorials
  • Future Trends and Research Directions
  • Emerging Techniques in Data Selection
    Opportunities for Innovation
  • Conclusion and Recap
  • Key Takeaways
    Final Thoughts on Data Selection for Generative Models
  • Practical Project
  • Design a Data Selection Pipeline
    Implement Filtering and Fine-tuning Strategies

Vakgebieden

Computer Science