What You Need to Know Before
You Start

Starts 7 June 2025 12:10

Ends 7 June 2025

00 days
00 hours
00 minutes
00 seconds
course image

How to Build a Cloud Native Data Lake with Open Source Technologies

Discover how to deploy a Kubernetes-based data lake using open source tools, from initial setup to running a complete prototype platform on your local machine.
Canonical Ubuntu via YouTube

Canonical Ubuntu

2544 Courses


30 minutes

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Discover how to deploy a Kubernetes-based data lake using open source tools, from initial setup to running a complete prototype platform on your local machine.

Syllabus

  • Introduction to Cloud Native Data Lakes
  • Overview of cloud native architectures
    Benefits of data lakes for data storage and analytics
  • Fundamentals of Kubernetes
  • Understanding container orchestration
    Setting up a local Kubernetes cluster (Minikube, kind, or K3s)
    Kubernetes basic operations: Pods, Services, and Deployments
  • Open Source Technologies for Data Lakes
  • Apache Hadoop and HDFS
    Apache Spark for data processing
    Apache Kafka for real-time data ingestion
  • Storage Layer
  • Setting up distributed file systems
    Configuring object storage solutions (e.g., MinIO, Ceph)
  • Data Ingestion
  • Configuring data ingestion pipelines with Kafka
    Exploring ETL tools like Apache NiFi and Apache Airflow
  • Data Processing
  • Running Spark jobs on Kubernetes
    Implementing batch and stream processing
  • Data Access & Querying
  • Setting up SQL query engines (e.g., Presto, Trino)
    Using Hive Metastore for schema management
  • Security and Governance
  • Implementing basic security practices
    Introduction to data governance tools (Apache Atlas)
  • Monitoring and Logging
  • Configuring monitoring tools (Prometheus, Grafana)
    Log aggregation and monitoring with ELK stack (Elasticsearch, Logstash, Kibana)
  • Deployment and Testing
  • Building a data lake prototype on a local machine
    Perform testing and data validation
  • Case Study and Hands-on Projects
  • Real-world data lake architecture case studies
    Capstone project: Deploy a cloud native data lake using open source tools on Kubernetes
  • Conclusion and Future Trends
  • Emerging trends in cloud native data technologies
    Examining the future of open source data lakes

Subjects

Business