Wat je moet weten voordat je
begint

Start 6 June 2026 12:43

Einde 6 June 2026

00 Dagen
00 Uren
00 Minuten
00 Seconden
course image

How to Build a Cloud Native Data Lake with Open Source Technologies

Join our comprehensive guide on building a cloud native data lake utilizing cutting-edge open source technologies. This session will walk you through deploying a Kubernetes-based data lake, from the initial setup phase to running a fully functional prototype on your local machine. Gain hands-on experience and insights into creating data-dri.
Canonical Ubuntu via YouTube

Canonical Ubuntu

6076 Cursussen


30 minutes

Optionele upgrade beschikbaar

Not Specified

Ga in je eigen tempo vooruit

Free Video

Optionele upgrade beschikbaar

Overzicht

Join our comprehensive guide on building a cloud native data lake utilizing cutting-edge open source technologies. This session will walk you through deploying a Kubernetes-based data lake, from the initial setup phase to running a fully functional prototype on your local machine.

Gain hands-on experience and insights into creating data-driven solutions that are efficient and scalable. Perfect for learners interested in artificial intelligence and business courses.

Find this invaluable educational resource provided by YouTube.

Lesprogramma

  • Introduction to Cloud Native Data Lakes
  • Overview of cloud native architectures
    Benefits of data lakes for data storage and analytics
  • Fundamentals of Kubernetes
  • Understanding container orchestration
    Setting up a local Kubernetes cluster (Minikube, kind, or K3s)
    Kubernetes basic operations: Pods, Services, and Deployments
  • Open Source Technologies for Data Lakes
  • Apache Hadoop and HDFS
    Apache Spark for data processing
    Apache Kafka for real-time data ingestion
  • Storage Layer
  • Setting up distributed file systems
    Configuring object storage solutions (e.g., MinIO, Ceph)
  • Data Ingestion
  • Configuring data ingestion pipelines with Kafka
    Exploring ETL tools like Apache NiFi and Apache Airflow
  • Data Processing
  • Running Spark jobs on Kubernetes
    Implementing batch and stream processing
  • Data Access & Querying
  • Setting up SQL query engines (e.g., Presto, Trino)
    Using Hive Metastore for schema management
  • Security and Governance
  • Implementing basic security practices
    Introduction to data governance tools (Apache Atlas)
  • Monitoring and Logging
  • Configuring monitoring tools (Prometheus, Grafana)
    Log aggregation and monitoring with ELK stack (Elasticsearch, Logstash, Kibana)
  • Deployment and Testing
  • Building a data lake prototype on a local machine
    Perform testing and data validation
  • Case Study and Hands-on Projects
  • Real-world data lake architecture case studies
    Capstone project: Deploy a cloud native data lake using open source tools on Kubernetes
  • Conclusion and Future Trends
  • Emerging trends in cloud native data technologies
    Examining the future of open source data lakes

Vakgebieden

Business