Wat je moet weten voordat je
begint

Start 22 July 2026 15:54

Einde 22 July 2026

00 Dagen

00 Uren

00 Minuten

00 Seconden

How to Build a Cloud Native Data Lake with Open Source Technologies

Join our comprehensive guide on building a cloud native data lake utilizing cutting-edge open source technologies. This session will walk you through deploying a Kubernetes-based data lake, from the initial setup phase to running a fully functional prototype on your local machine. Gain hands-on experience and insights into creating data-dri.

Canonical Ubuntu via YouTube

30 minutes

Optionele upgrade beschikbaar

Not Specified

Ga in je eigen tempo vooruit

Free Video

Optionele upgrade beschikbaar

Overzicht

Gain hands-on experience and insights into creating data-driven solutions that are efficient and scalable. Perfect for learners interested in artificial intelligence and business courses.

Find this invaluable educational resource provided by YouTube.

Lesprogramma

Introduction to Cloud Native Data Lakes

Overview of cloud native architectures

Benefits of data lakes for data storage and analytics

Fundamentals of Kubernetes

Understanding container orchestration

Setting up a local Kubernetes cluster (Minikube, kind, or K3s)

Kubernetes basic operations: Pods, Services, and Deployments

Open Source Technologies for Data Lakes

Apache Hadoop and HDFS

Apache Spark for data processing

Apache Kafka for real-time data ingestion

Storage Layer

Setting up distributed file systems

Configuring object storage solutions (e.g., MinIO, Ceph)

Data Ingestion

Configuring data ingestion pipelines with Kafka

Exploring ETL tools like Apache NiFi and Apache Airflow

Data Processing

Running Spark jobs on Kubernetes

Implementing batch and stream processing

Data Access & Querying

Setting up SQL query engines (e.g., Presto, Trino)

Using Hive Metastore for schema management

Security and Governance

Implementing basic security practices

Introduction to data governance tools (Apache Atlas)

Monitoring and Logging

Configuring monitoring tools (Prometheus, Grafana)

Log aggregation and monitoring with ELK stack (Elasticsearch, Logstash, Kibana)

Deployment and Testing

Building a data lake prototype on a local machine

Perform testing and data validation

Case Study and Hands-on Projects

Real-world data lake architecture case studies

Capstone project: Deploy a cloud native data lake using open source tools on Kubernetes

Conclusion and Future Trends

Emerging trends in cloud native data technologies

Examining the future of open source data lakes

Vakgebieden

Business

Wat je moet weten voordat je begint

How to Build a Cloud Native Data Lake with Open Source Technologies

30 minutes

Not Specified

Free Video

Overzicht

Lesprogramma

Vakgebieden

AI for FP&A Automation & Modeling

FP&A with AI: Capstone Project

Interpretability of LLMs - Generating SAE Feature Descriptions - Spring 2026

CodeCloak: A DRL-Based Method for Mitigating Code Leakage by LLM Code Assistants

Generative AI for NLP with PyTorch

Machine Learning Engineer: ML and Deep Learning Models

Wat je moet weten voordat je
begint