What You Need to Know Before
You Start

Starts 7 June 2025 05:00

Ends 7 June 2025

00 days
00 hours
00 minutes
00 seconds
course image

Good LLMs Need BAD Data: The Shocking Truth

Discover the counterintuitive finding that including "bad data" in LLM training can lead to more controllable AI systems, as Harvard researchers demonstrate how this approach enables better post-training behavior mitigation.
Discover AI via YouTube

Discover AI

2484 Courses


35 minutes

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Discover the counterintuitive finding that including "bad data" in LLM training can lead to more controllable AI systems, as Harvard researchers demonstrate how this approach enables better post-training behavior mitigation.

Syllabus

  • Introduction to LLMs and Data Quality
  • Overview of Large Language Models
    The role of data in training LLMs
  • Traditional Views on Data Quality in AI
  • The emphasis on high-quality data
    Risks of poor-quality data in machine learning
  • The Counterintuitive Role of "Bad Data"
  • Definition and examples of "bad data"
    Introduction to the Harvard study
  • Insights from Harvard's Research
  • Key findings from the study
    How "bad data" contributes to controllability
  • Mechanisms of Behavior Mitigation
  • Techniques for mitigating AI behavior post-training
    How "bad data" enhances these methods
  • Case Studies and Practical Applications
  • Real-world examples of "bad data" usage
    Comparative analysis with traditional methods
  • Designing a Training Dataset
  • Balancing good and bad data
    Ethical considerations and challenges
  • Implementation Strategies
  • Integrating bad data into the LLM training pipeline
    Monitoring and evaluating outcomes
  • Future Directions and Research
  • Potential developments in AI data strategy
    Open questions and ongoing research areas
  • Conclusion and Q&A
  • Summary of key concepts
    Open floor for discussion and questions

Subjects

Computer Science