What You Need to Know Before
You Start

Starts 1 July 2025 11:52

Ends 1 July 2025

00 Days
00 Hours
00 Minutes
00 Seconds
course image

Open Source and the Data Lakehouse - Understanding Components and Technologies

Join us on a journey to understand the transformative power of data lakehouses, where cutting-edge technology meets economic efficiency. Delve into the world of Apache Arrow, Iceberg, and Project Nessie, and discover how they serve as revolutionary alternatives to traditional data warehouses. This exploration offers insights into how these o.
OSACon via YouTube

OSACon

2765 Courses


26 minutes

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Join us on a journey to understand the transformative power of data lakehouses, where cutting-edge technology meets economic efficiency. Delve into the world of Apache Arrow, Iceberg, and Project Nessie, and discover how they serve as revolutionary alternatives to traditional data warehouses.

This exploration offers insights into how these open-source components maximize both performance and affordability, paving the way for advancements in data handling and storage.

Syllabus

  • Introduction to Data Lakehouses
  • Definition and key characteristics
    Comparison with data warehouses and data lakes
    Benefits and limitations of data lakehouses
  • Core Components of Data Lakehouses
  • Storage and compute separation
    Metadata management
    Query engines and optimization
  • Apache Arrow
  • Overview of Apache Arrow
    In-memory columnar format
    Performance benefits for data lakehouses
    Integration with other data technologies
  • Apache Iceberg
  • Introduction to Apache Iceberg
    Architecture and features
    Advantages over traditional table formats
    Use cases and implementation examples
  • Project Nessie
  • Overview of Project Nessie
    Version control for data lakehouses
    Branching, merging, and reproducibility
    Ecosystem and integration
  • Comparing Open Source Data Lakehouse Technologies
  • Use cases and performance comparisons
    Cost and affordability analysis
    Case studies of successful implementations
  • Practical Considerations and Best Practices
  • Data governance and security
    Performance optimization strategies
    Choosing the right components for specific needs
  • Future Trends and Developments in Data Lakehouses
  • Emerging technologies and innovations
    Industry adoption and evolution
    Speculations on future directions in data management
  • Course Review and Final Thoughts
  • Recap of key concepts and technologies
    Discussion on the impact of data lakehouses in the industry
    Q&A and interactive discussions

Subjects

Business