What You Need to Know Before
You Start

Starts 7 June 2025 12:10

Ends 7 June 2025

00 days

00 hours

00 minutes

00 seconds

How to Build a Cloud Native Data Lake with Open Source Technologies

Discover how to deploy a Kubernetes-based data lake using open source tools, from initial setup to running a complete prototype platform on your local machine.

Canonical Ubuntu via YouTube

30 minutes

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Discover how to deploy a Kubernetes-based data lake using open source tools, from initial setup to running a complete prototype platform on your local machine.

Syllabus

Introduction to Cloud Native Data Lakes

Overview of cloud native architectures

Benefits of data lakes for data storage and analytics

Fundamentals of Kubernetes

Understanding container orchestration

Setting up a local Kubernetes cluster (Minikube, kind, or K3s)

Kubernetes basic operations: Pods, Services, and Deployments

Open Source Technologies for Data Lakes

Apache Hadoop and HDFS

Apache Spark for data processing

Apache Kafka for real-time data ingestion

Storage Layer

Setting up distributed file systems

Configuring object storage solutions (e.g., MinIO, Ceph)

Data Ingestion

Configuring data ingestion pipelines with Kafka

Exploring ETL tools like Apache NiFi and Apache Airflow

Data Processing

Running Spark jobs on Kubernetes

Implementing batch and stream processing

Data Access & Querying

Setting up SQL query engines (e.g., Presto, Trino)

Using Hive Metastore for schema management

Security and Governance

Implementing basic security practices

Introduction to data governance tools (Apache Atlas)

Monitoring and Logging

Configuring monitoring tools (Prometheus, Grafana)

Log aggregation and monitoring with ELK stack (Elasticsearch, Logstash, Kibana)

Deployment and Testing

Building a data lake prototype on a local machine

Perform testing and data validation

Case Study and Hands-on Projects

Real-world data lake architecture case studies

Capstone project: Deploy a cloud native data lake using open source tools on Kubernetes

Conclusion and Future Trends

Emerging trends in cloud native data technologies

Examining the future of open source data lakes

Subjects

Business

What You Need to Know Before You Start

How to Build a Cloud Native Data Lake with Open Source Technologies

30 minutes

Not Specified

Free Video

Overview

Syllabus

Subjects

The Artificial Sweetener That's Actually Good For You

Shaking up the Ransomware Game - Introducing Scattered Spider

Lessons Learned from Implementing an Intel-Based Purple Teaming Process

The Telltale Signs of AI-Generated Emails - Building a Detection Engine

Futureproofing Cyber Ahead of the Next Wave of Emerging Tech

Redefining Universal ZTNA - Security and Resilience for All Users and Things

What You Need to Know Before
You Start