What You Need to Know Before
You Start

Starts 22 July 2026 15:08

Ends 22 July 2026

00 Days

00 Hours

00 Minutes

00 Seconds

Mastering Classic Reinforcement Learning Algorithms

Explore the mathematical foundations of reinforcement learning, covering Markov decision processes, dynamic programming, Q-learning, and SARSA to solve finite decision-making problems using tabular methods.

University of Colorado Boulder via Coursera

5 weeks, 3 hours a week

Optional upgrade avallable

Intermediate

Progress at your own speed

Paid Course

Optional upgrade avallable

Overview

How can an agent learn to make good decisions through repeated interaction with an uncertain environment? This course introduces the mathematical and algorithmic foundations of classical reinforcement learning, with an emphasis on finite Markov decision processes and tabular methods.

The course begins with the simplest settings in which the central ideas are clearest:

deterministic decision processes, discounted rewards, and Bellman optimality equations. It then introduces stochasticity through Markov chains and Markov decision processes, where learners study policies, value functions, expected discounted reward, and dynamic programming.

With this foundation in place, the course turns to planning methods for known models, including value iteration, policy iteration, and linear programming formulations. The second half of the course studies reinforcement learning when the model is unknown and the agent must learn from sampled experience.

Topics include multi-armed bandits, exploration and exploitation, Monte Carlo methods, temporal-difference learning, SARSA, Q-learning, and convergence principles. The course ends with a final assessment in which learners solve the same finite MDP from both model-based planning and model-free learning perspectives.

By the end of the course, learners will be able to formulate finite decision-making problems as Markov decision processes, solve them using classical planning algorithms, and implement tabular reinforcement-learning algorithms from sampled data. This course provides the foundation for later study of deep reinforcement learning, reward programming, and trustworthy AI systems.

This course can be taken for academic credit as part of CU Boulder’s Masters of Science in Computer Science (MS-CS) and Master of Science in Artificial Intelligence (MS-AI) degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition.

Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals.

Learn more:

MS in Artificial Intelligence:

https:

//www.coursera.org/degrees/ms-artificial-intelligence-boulder MS in Computer Science:

https:

//coursera.org/degrees/ms-computer-science-boulder

Syllabus

Deterministic Decision Processes

This module introduces the modeling and optimization foundations for sequential decision-making in their simplest form: deterministic decision processes with discounted rewards. We begin with states, actions, transitions, and rewards as a language for representing decision problems over time. We then develop value functions and Bellman equations as tools for optimizing long-term return. The goal is to build intuition for why dynamic programming is correct in the simpler setting of deterministic decision processes before introducing stochastic transitions, learning from sampled experience, and bootstrapping in later modules.

Markov Chains and Markov Decision Processes

This module adds stochasticity to the deterministic picture developed in the previous module. Learners continue with the surprise-quiz example, now with uncertain outcomes: studying usually helps but may not always help, and relaxing may reduce preparation but may not always do so. The module first introduces stochastic transitions as probability distributions over next states, then studies Markov chains as stochastic systems without choices and finally adds actions to obtain Markov decision processes. The goal is to make expected discounted reward, policies, and Bellman equations feel like natural extensions of the deterministic setting.

Dynamic Programming in MDPs

This module focuses on known-model optimization. Learners use Bellman equations as computational tools for policy evaluation, policy improvement, value iteration, policy iteration, and linear programming formulations of discounted MDPs.

Learning from Sampled Experience

This module begins the transition from planning to reinforcement learning. In

Control, Exploration, and Tabular RL Algorithms

This module completes the tabular reinforcement-learning part of Course 1. Module 4 introduced sample-based learning through bandits and Monte Carlo methods. Module 5 introduces temporal-difference learning: updating after one sampled transition by combining an observed reward with a bootstrapped value estimate. The module ends by summarizing tabular reinforcement learning and motivating the transition to function approximation and deep RL.

Taught by

Ashutosh Trivedi

Subjects

Computer Science

What You Need to Know Before
You Start

Mastering Classic Reinforcement Learning Algorithms

Online University of Colorado Boulder Сourses

Summer courses at the University of Colorado Boulder

Best CU Boulder Courses for Students

Courses at CU Boulder: More with Free Lessons

Pluses of studying online courses at the University of Colorado Boulder

Conclusion

5 weeks, 3 hours a week

Intermediate

Paid Course

Overview

Syllabus

Taught by

Subjects

What You Need to Know Before You Start

Mastering Classic Reinforcement Learning Algorithms

Online University of Colorado Boulder Сourses

Summer courses at the University of Colorado Boulder

Best CU Boulder Courses for Students

Courses at CU Boulder: More with Free Lessons

Pluses of studying online courses at the University of Colorado Boulder

Conclusion

5 weeks, 3 hours a week

Intermediate

Paid Course

Overview

Syllabus

Taught by

Subjects

AI for FP&A Automation & Modeling

FP&A with AI: Capstone Project

Generative AI for NLP with PyTorch

Machine Learning Engineer: ML and Deep Learning Models

Data Preparation & Applied Machine Learning

Foundations of Reinforcement Learning

What You Need to Know Before
You Start