Wat je moet weten voordat je
begint

Start 4 June 2026 10:10

Einde 4 June 2026

00 Dagen
00 Uren
00 Minuten
00 Seconden
course image

AI Orchestration: From local models to cloud

Master AI orchestration across local and cloud environments—build prompt engineering pipelines, deploy models with Ollama and llamafile, optimize GPU inference in Rust, and design cost-effective workflows using AWS Spot instances.
Pragmatic AI Labs via Coursera

Pragmatic AI Labs

2868 Cursussen


5 hours

Optionele upgrade beschikbaar

Beginner

Ga in je eigen tempo vooruit

Paid Course

Optionele upgrade beschikbaar

Overzicht

Learn to orchestrate AI systems across local and cloud environments through hands-on infrastructure setup, model deployment, and workflow integration. You will build a prompt engineering pyramid from basic prompts to chain-of-thought reasoning implemented in Rust, then evaluate six decision factors for choosing between local and cloud models including latency, throughput, cost, and privacy.

The course covers local AI infrastructure in depth:

running Ollama with custom Modelfiles for task-specific assistants, deploying llamafile for zero-dependency portable inference, compiling Rust Candle with CUDA for GPU-accelerated local inference, and optimizing local RAG with caching strategies. You will configure a complete AI workstation with tmux for session management, nvidia-smi and Zenith for GPU monitoring, and NVIDIA GPU optimization.

The final module covers cloud workflows including AWS Spot instances for cost-effective GPU compute, Hugging Face model discovery and download, and GitHub AI models integration. By completing this course, you will be able to set up local AI infrastructure, deploy models across local and cloud environments, and design orchestration workflows that balance cost, privacy, and performance.

Lesprogramma

  • Orchestration Fundamentals
  • A comprehensive course covering prompt engineering with chain-of-thought reasoning, local inference runtimes (Ollama, llamafile, Candle), GPU workstation configuration, and cost-optimized cloud deployment with AWS Spot instances.
  • Local AI Infrastructure
  • Covers local vs cloud model tradeoffs, caching strategies, local RAG optimization, Ollama with custom Modelfiles, llamafile portable deployment, and Candle GPU-accelerated Rust inference.
  • Workstation and Cloud Workflows
  • Covers tmux session management, nvidia-smi and Zenith GPU monitoring, local workstation orchestration, AWS Spot instance deployment, Hugging Face and GitHub AI model workflows, and Rust project structure.
  • Capstone
  • Head-to-head comparison of Ollama vs `apr` ([paiml/aprender](https://github.com/paiml/aprender)) running Qwen2.5-Coder-1.5B on the same prompt suite, same hardware. Build a chain-of-thought routing engine that selects runtimes based on task complexity and validation requirements, with cost analysis spanning local workstations, Spot instances, and Bedrock.

Gegeven door

Alfredo Deza and Noah Gift


Vakgebieden

Artificial Intelligence