What You Need to Know Before
You Start

Starts 4 June 2025 15:51

Ends 4 June 2025

00 days
00 hours
00 minutes
00 seconds
course image

AI and ML: The Critical Operational Side of Running Applications in Kubernetes

Discover how to effectively manage AI and ML operations using service mesh, focusing on GPU workloads, multitenancy, and scaling in Kubernetes environments for reliable and observable ML applications.
CNCF [Cloud Native Computing Foundation] via YouTube

CNCF [Cloud Native Computing Foundation]

2458 Courses


28 minutes

Optional upgrade avallable

Not Specified

Progress at your own speed

Free Video

Optional upgrade avallable

Overview

Discover how to effectively manage AI and ML operations using service mesh, focusing on GPU workloads, multitenancy, and scaling in Kubernetes environments for reliable and observable ML applications.

Syllabus

  • Introduction to Kubernetes for AI/ML
  • Overview of Kubernetes architecture
    Key concepts: pods, nodes, and clusters
    Kubernetes networking basics
  • Understanding AI and ML Workloads in Kubernetes
  • Characteristics of AI/ML workloads
    Common challenges in deploying AI/ML on Kubernetes
    Introduction to GPU utilization in Kubernetes
  • Service Mesh Fundamentals
  • Definition and benefits of a service mesh
    Overview of popular service mesh technologies (Istio, Linkerd, etc.)
    Implementing a service mesh in Kubernetes for AI/ML applications
  • Managing GPU Workloads with Kubernetes
  • Configuring Kubernetes for GPU scheduling
    Best practices for GPU resource management
    Tools and frameworks for optimizing GPU workload performance
  • Multitenancy in Kubernetes
  • Approaches to achieving multitenancy
    Managing namespaces and resource quotas
    Security considerations in multitenant environments
  • Scaling AI/ML Applications in Kubernetes
  • Horizontal and vertical pod autoscaling
    Load balancing and resilience strategies
    Handling stateful vs stateless workloads
  • Observability and Monitoring in AI/ML Operations
  • Setting up monitoring for Kubernetes-based applications
    Using tools like Prometheus and Grafana
    Implementing logging and tracing
  • Case Studies and Best Practices
  • Real-world examples of AI/ML deployments in Kubernetes
    Lessons learned and best practices from industry
  • Summary and Q&A
  • Key takeaways from the course
    Open floor for questions and clarifications

Subjects

Programming