What You Need to Know Before
You Start

Starts 6 June 2026 23:46

Ends 6 June 2026

00 Days
00 Hours
00 Minutes
00 Seconds
course image

Inference techniques for local and cloud LLM deployment

Master LLM inference techniques for deploying Llama models locally and in the cloud, including quantization, prompt pipelines, and scalable real-world applications.
Meta via Coursera

Meta

2874 Courses


Not Specified

Optional upgrade avallable

Not Specified

Progress at your own speed

Paid Course

Optional upgrade avallable

Overview

Expand your AI development skills by learning to run Llama models on local machines and at scale in the cloud. Master the principles of LLM inference, apply quantization for efficiency, and design prompt pipelines for real-world tasks.

Through hands-on projects, you’ll deploy an LLM-enabled tool that demonstrates scalability, performance, and practical impact.

Syllabus

  • Introduction to LLM (Large Language Models)
  • Overview of Llama models
    Key concepts in LLMs: Parameters, training, and inference
  • LLM Inference Principles
  • Understanding inference in AI models
    Differences between training and inference
    Challenges in LLM inference
  • Local LLM Deployment
  • Setting up a local environment for LLM
    Running Llama models on local machines
    Optimizing local performance
  • Cloud-Based LLM Deployment
  • Introduction to cloud platforms for AI
    Deploying Llama models in the cloud
    Managing cloud resources for scalability
  • Efficiency in LLM Inference
  • Techniques for model optimization
    Understanding and applying quantization
    Balancing accuracy and efficiency
  • Designing Prompt Pipelines
  • Introduction to prompt engineering
    Building and testing prompt pipelines
    Adapting prompts for various tasks
  • Hands-on Projects
  • Project 1: Deploy a Llama model locally
    Project 2: Scale an LLM on a cloud platform
    Project 3: Design a prompt pipeline for a specific application
  • Evaluation and Performance Monitoring
  • Metrics for evaluating model performance
    Tools for monitoring inference efficiency
    Iterative improvements based on feedback
  • Real-World Applications and Impact
  • Case studies of LLM deployment in industry
    Scalability and practical implications
    Ethical considerations in LLM deployment
  • Course Recap and Future Trends
  • Review key learnings
    Discussion of emerging technologies and trends in LLMs
    Guidance on further learning and development in the field of AI

Taught by

Taught by Meta Staff


Subjects

Computer Science