What You Need to Know Before
You Start
Starts 4 June 2025 22:03
Ends 4 June 2025
00
days
00
hours
00
minutes
00
seconds
Data Selection - Data Challenges when Training Generative Models
Explore strategic data selection techniques for efficient generative AI model training, covering filtering methods for pre-training and optimal transport approaches for fine-tuning that reduce data requirements while maintaining performance.
Scalable Parallel Computing Lab, SPCL @ ETH Zurich
via YouTube
Scalable Parallel Computing Lab, SPCL @ ETH Zurich
2458 Courses
1 hour
Optional upgrade avallable
Not Specified
Progress at your own speed
Free Video
Optional upgrade avallable
Overview
Explore strategic data selection techniques for efficient generative AI model training, covering filtering methods for pre-training and optimal transport approaches for fine-tuning that reduce data requirements while maintaining performance.
Syllabus
- Introduction to Data Selection in Generative Model Training
- Filtering Methods for Pre-training
- Strategic Data Selection Techniques
- Optimal Transport Approaches for Fine-tuning
- Balancing Data Efficiency and Model Performance
- Case Studies and Industry Applications
- Tools and Frameworks for Data Selection
- Future Trends and Research Directions
- Conclusion and Recap
- Practical Project
Importance of Data Selection
Overview of Generative Models
Data Quality Assessment
Data Deduplication Techniques
Noise Reduction Strategies
Importance Sampling
Submodular Optimization Approaches
Active Learning for Data Curation
Principles of Optimal Transport
Applications in Model Fine-tuning
Case Studies in Reduced Data Requirements
Trade-offs in Data Selection
Performance Metrics and Evaluation
Real-world Examples
Success Stories and Lessons Learned
Overview of Available Tools
Practical Exercises and Tutorials
Emerging Techniques in Data Selection
Opportunities for Innovation
Key Takeaways
Final Thoughts on Data Selection for Generative Models
Design a Data Selection Pipeline
Implement Filtering and Fine-tuning Strategies
Subjects
Computer Science