What You Need to Know Before
You Start

Starts 6 June 2026 13:43

Ends 6 June 2026

00 Days
00 Hours
00 Minutes
00 Seconds
course image

How to Build a Cloud Native Data Lake with Open Source Technologies

Join our comprehensive guide on building a cloud native data lake utilizing cutting-edge open source technologies. This session will walk you through deploying a Kubernetes-based data lake, from the initial setup phase to running a fully functional prototype on your local machine. Gain hands-on experience and insights into creating data-dri.
Canonical Ubuntu via YouTube

Canonical Ubuntu

6076 Courses


30 minutes

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Join our comprehensive guide on building a cloud native data lake utilizing cutting-edge open source technologies. This session will walk you through deploying a Kubernetes-based data lake, from the initial setup phase to running a fully functional prototype on your local machine.

Gain hands-on experience and insights into creating data-driven solutions that are efficient and scalable. Perfect for learners interested in artificial intelligence and business courses.

Find this invaluable educational resource provided by YouTube.

Syllabus

  • Introduction to Cloud Native Data Lakes
  • Overview of cloud native architectures
    Benefits of data lakes for data storage and analytics
  • Fundamentals of Kubernetes
  • Understanding container orchestration
    Setting up a local Kubernetes cluster (Minikube, kind, or K3s)
    Kubernetes basic operations: Pods, Services, and Deployments
  • Open Source Technologies for Data Lakes
  • Apache Hadoop and HDFS
    Apache Spark for data processing
    Apache Kafka for real-time data ingestion
  • Storage Layer
  • Setting up distributed file systems
    Configuring object storage solutions (e.g., MinIO, Ceph)
  • Data Ingestion
  • Configuring data ingestion pipelines with Kafka
    Exploring ETL tools like Apache NiFi and Apache Airflow
  • Data Processing
  • Running Spark jobs on Kubernetes
    Implementing batch and stream processing
  • Data Access & Querying
  • Setting up SQL query engines (e.g., Presto, Trino)
    Using Hive Metastore for schema management
  • Security and Governance
  • Implementing basic security practices
    Introduction to data governance tools (Apache Atlas)
  • Monitoring and Logging
  • Configuring monitoring tools (Prometheus, Grafana)
    Log aggregation and monitoring with ELK stack (Elasticsearch, Logstash, Kibana)
  • Deployment and Testing
  • Building a data lake prototype on a local machine
    Perform testing and data validation
  • Case Study and Hands-on Projects
  • Real-world data lake architecture case studies
    Capstone project: Deploy a cloud native data lake using open source tools on Kubernetes
  • Conclusion and Future Trends
  • Emerging trends in cloud native data technologies
    Examining the future of open source data lakes

Subjects

Business