What You Need to Know Before
You Start
Starts 7 June 2025 05:00
Ends 7 June 2025
00
days
00
hours
00
minutes
00
seconds
Good LLMs Need BAD Data: The Shocking Truth
Discover the counterintuitive finding that including "bad data" in LLM training can lead to more controllable AI systems, as Harvard researchers demonstrate how this approach enables better post-training behavior mitigation.
Discover AI
via YouTube
Discover AI
2484 Courses
35 minutes
Optional upgrade avallable
Not Specified
Progress at your own speed
Free Video
Optional upgrade avallable
Overview
Discover the counterintuitive finding that including "bad data" in LLM training can lead to more controllable AI systems, as Harvard researchers demonstrate how this approach enables better post-training behavior mitigation.
Syllabus
- Introduction to LLMs and Data Quality
- Traditional Views on Data Quality in AI
- The Counterintuitive Role of "Bad Data"
- Insights from Harvard's Research
- Mechanisms of Behavior Mitigation
- Case Studies and Practical Applications
- Designing a Training Dataset
- Implementation Strategies
- Future Directions and Research
- Conclusion and Q&A
Overview of Large Language Models
The role of data in training LLMs
The emphasis on high-quality data
Risks of poor-quality data in machine learning
Definition and examples of "bad data"
Introduction to the Harvard study
Key findings from the study
How "bad data" contributes to controllability
Techniques for mitigating AI behavior post-training
How "bad data" enhances these methods
Real-world examples of "bad data" usage
Comparative analysis with traditional methods
Balancing good and bad data
Ethical considerations and challenges
Integrating bad data into the LLM training pipeline
Monitoring and evaluating outcomes
Potential developments in AI data strategy
Open questions and ongoing research areas
Summary of key concepts
Open floor for discussion and questions
Subjects
Computer Science