What You Need to Know Before
You Start
Starts 7 June 2025 12:10
Ends 7 June 2025
00
days
00
hours
00
minutes
00
seconds
30 minutes
Optional upgrade avallable
Not Specified
Progress at your own speed
Free Video
Optional upgrade avallable
Overview
Discover how to deploy a Kubernetes-based data lake using open source tools, from initial setup to running a complete prototype platform on your local machine.
Syllabus
- Introduction to Cloud Native Data Lakes
- Fundamentals of Kubernetes
- Open Source Technologies for Data Lakes
- Storage Layer
- Data Ingestion
- Data Processing
- Data Access & Querying
- Security and Governance
- Monitoring and Logging
- Deployment and Testing
- Case Study and Hands-on Projects
- Conclusion and Future Trends
Overview of cloud native architectures
Benefits of data lakes for data storage and analytics
Understanding container orchestration
Setting up a local Kubernetes cluster (Minikube, kind, or K3s)
Kubernetes basic operations: Pods, Services, and Deployments
Apache Hadoop and HDFS
Apache Spark for data processing
Apache Kafka for real-time data ingestion
Setting up distributed file systems
Configuring object storage solutions (e.g., MinIO, Ceph)
Configuring data ingestion pipelines with Kafka
Exploring ETL tools like Apache NiFi and Apache Airflow
Running Spark jobs on Kubernetes
Implementing batch and stream processing
Setting up SQL query engines (e.g., Presto, Trino)
Using Hive Metastore for schema management
Implementing basic security practices
Introduction to data governance tools (Apache Atlas)
Configuring monitoring tools (Prometheus, Grafana)
Log aggregation and monitoring with ELK stack (Elasticsearch, Logstash, Kibana)
Building a data lake prototype on a local machine
Perform testing and data validation
Real-world data lake architecture case studies
Capstone project: Deploy a cloud native data lake using open source tools on Kubernetes
Emerging trends in cloud native data technologies
Examining the future of open source data lakes
Subjects
Business