מה צריך לדעת לפני
שתתחיל

מתחיל 22 July 2026 23:03

נגמר 22 July 2026

00 ימים

00 שעות

00 דקות

00 שניות

Scale to 0 LLM Inference: Cost Efficient Open Model Deployment on Serverless GPUs

Discover the innovative approach to deploying LLM models on serverless GPUs that scale efficiently to zero during inactivity. This session will guide you through the process of running Ollama on these advanced infrastructures, allowing for cost-effective open LLM deployment. Gain complete control over both models and private data, optimizing.

Devoxx via YouTube

17 minutes

שדרוג אופציונלי זמין

Not Specified

התקדמות בקצב שלך

Free Video

שדרוג אופציונלי זמין

סקירה כללית

Gain complete control over both models and private data, optimizing performance and expenditure.

סילבוס

**Introduction to Serverless GPU Computing**

What is serverless computing?

Benefits of serverless infrastructures for AI/ML

Understanding GPU utilization and scaling

**Overview of Ollama and LLM Deployment**

What is Ollama?

Introduction to Large Language Models (LLMs)

Importance of model and data privacy

**Setting Up a Serverless Environment**

Selecting a cloud provider

Setting up serverless GPU resources

Configuring security and access permissions

**Deploying LLMs on Serverless GPUs**

Installing and configuring Ollama

Model selection and preparation

Packaging and deploying an LLM

**Cost Optimization Strategies**

Scaling to zero: Understanding and leveraging scale-down strategies

Monitoring usage and costs

Implementing usage-based triggers

**Maintaining Model and Data Privacy**

Ensuring data remains private and secure

Methods for encrypting communications

GDPR and other privacy compliance considerations

**Performance Optimization**

Techniques for improving inference speed

Balancing cost and performance

Case studies of successful deployment solutions

**Troubleshooting and Support**

Common issues and solutions

Accessing community and support resources

Future-proofing and maintaining systems

**Capstone Project**

Deploy a sample LLM using serverless GPUs

Presentation and evaluation of deployment strategy

**Course Conclusion and Future Directions**

Recap of key concepts

Emerging trends in AI deployment

Opportunities for further learning and exploration

נושאים

Computer Science

מה צריך לדעת לפני שתתחיל

Scale to 0 LLM Inference: Cost Efficient Open Model Deployment on Serverless GPUs

17 minutes

Not Specified

Free Video

סקירה כללית

סילבוס

נושאים

AI for FP&A Automation & Modeling

FP&A with AI: Capstone Project

Interpretability of LLMs - Generating SAE Feature Descriptions - Spring 2026

CodeCloak: A DRL-Based Method for Mitigating Code Leakage by LLM Code Assistants

Generative AI for NLP with PyTorch

Machine Learning Engineer: ML and Deep Learning Models

מה צריך לדעת לפני
שתתחיל