What You Need to Know Before
You Start

Starts 1 July 2026 10:07

Ends 1 July 2026

00 Days
00 Hours
00 Minutes
00 Seconds
course image

Mastering Classic Reinforcement Learning Algorithms

Explore the mathematical foundations of reinforcement learning, covering Markov decision processes, dynamic programming, Q-learning, and SARSA to solve finite decision-making problems using tabular methods.
University of Colorado Boulder via Coursera

University of Colorado Boulder

40 Courses


The University of Colorado Boulder, often referred to as CU Boulder, offers a wide range of educational programs and courses, both in-person and online. Students can choose from a variety of study tracks, including courses in the arts, science, engineering, business and more.

Online University of Colorado Boulder Сourses

One of the key benefits of CU Boulder is the ability to take courses online. This is a great opportunity for students who want a quality education but prefer flexibility in their schedule. CU Boulder's online courses provide access to highly qualified faculty and the most up-to-date materials.

Summer courses at the University of Colorado Boulder

Summer is a great time to explore new topics and expand your knowledge. The University of Colorado Boulder offers a variety of summer courses both online and in-person. This is an excellent opportunity for students to spend their summer usefully by studying subjects of interest.

Best CU Boulder Courses for Students

CU Boulder not only offers a wide variety of programs, but also a high-quality education. Students can choose from a variety of courses, from basic to advanced, to develop their skills and interests. The university actively uses innovative approaches to teaching, such as AI Education, which helps students gain up-to-date knowledge.

Courses at CU Boulder: More with Free Lessons

With a variety of free courses at CU Boulder, students can expand their knowledge in a variety of areas. These courses are available for both beginners and advanced students, allowing everyone to find a suitable training option to suit their interests.

Pluses of studying online courses at the University of Colorado Boulder

The University of Colorado Boulder provides students with a unique opportunity to study through online courses, which has a number of significant advantages.

Firstly, the advantage of studying online courses at the University of Colorado Boulder is the flexible schedule. Students can choose their own time to study material and watch lectures, making it easier for them to balance their studies with other responsibilities such as work or family commitments. This flexibility makes education more accessible to a wider range of people.

Secondly, University of Colorado Boulder courses provide students with the opportunity to study unique material presented by experienced teachers. Through access to experts in various fields of knowledge, students can gain relevant knowledge and skills that will be useful in the modern world.

The third benefit of taking CU Boulder online courses is the opportunity to connect and collaborate with other students from different countries and cultures. This contributes to an enriching educational experience by allowing students to be exposed to different points of view and broaden their horizons.

Additionally, University of Colorado Boulder online courses typically offer a variety of interactive learning materials, making the learning process more fun and effective. Students can learn through video lectures, tests, forums, and other innovative methods that stimulate learning.

Thus, taking University of Colorado Boulder online courses offers students many benefits, including flexible scheduling, access to experts, international communication, and an interactive educational approach. This is an excellent opportunity for students to receive a quality education, expand their knowledge and skills, and prepare for the challenges of the modern world.

Conclusion

The University of Colorado Boulder is a place where students can receive a quality education with a variety of courses and programs to choose from. Whether you're looking for online or in-person training, summer courses or free programs, CU Boulder offers ample opportunities for development and learning!

5 weeks, 3 hours a week

Optional upgrade avallable

Intermediate

Progress at your own speed

Paid Course

Optional upgrade avallable

Overview

How can an agent learn to make good decisions through repeated interaction with an uncertain environment? This course introduces the mathematical and algorithmic foundations of classical reinforcement learning, with an emphasis on finite Markov decision processes and tabular methods.

The course begins with the simplest settings in which the central ideas are clearest:

deterministic decision processes, discounted rewards, and Bellman optimality equations. It then introduces stochasticity through Markov chains and Markov decision processes, where learners study policies, value functions, expected discounted reward, and dynamic programming.

With this foundation in place, the course turns to planning methods for known models, including value iteration, policy iteration, and linear programming formulations. The second half of the course studies reinforcement learning when the model is unknown and the agent must learn from sampled experience.

Topics include multi-armed bandits, exploration and exploitation, Monte Carlo methods, temporal-difference learning, SARSA, Q-learning, and convergence principles. The course ends with a final assessment in which learners solve the same finite MDP from both model-based planning and model-free learning perspectives.

By the end of the course, learners will be able to formulate finite decision-making problems as Markov decision processes, solve them using classical planning algorithms, and implement tabular reinforcement-learning algorithms from sampled data. This course provides the foundation for later study of deep reinforcement learning, reward programming, and trustworthy AI systems.

This course can be taken for academic credit as part of CU Boulder’s Masters of Science in Computer Science (MS-CS) and Master of Science in Artificial Intelligence (MS-AI) degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition.

Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals.

Learn more:

MS in Artificial Intelligence:

https:

//www.coursera.org/degrees/ms-artificial-intelligence-boulder MS in Computer Science:

https:

//coursera.org/degrees/ms-computer-science-boulder

Syllabus

  • Deterministic Decision Processes
  • This module introduces the modeling and optimization foundations for sequential decision-making in their simplest form: deterministic decision processes with discounted rewards. We begin with states, actions, transitions, and rewards as a language for representing decision problems over time. We then develop value functions and Bellman equations as tools for optimizing long-term return. The goal is to build intuition for why dynamic programming is correct in the simpler setting of deterministic decision processes before introducing stochastic transitions, learning from sampled experience, and bootstrapping in later modules.
  • Markov Chains and Markov Decision Processes
  • This module adds stochasticity to the deterministic picture developed in the previous module. Learners continue with the surprise-quiz example, now with uncertain outcomes: studying usually helps but may not always help, and relaxing may reduce preparation but may not always do so. The module first introduces stochastic transitions as probability distributions over next states, then studies Markov chains as stochastic systems without choices and finally adds actions to obtain Markov decision processes. The goal is to make expected discounted reward, policies, and Bellman equations feel like natural extensions of the deterministic setting.
  • Dynamic Programming in MDPs
  • This module focuses on known-model optimization. Learners use Bellman equations as computational tools for policy evaluation, policy improvement, value iteration, policy iteration, and linear programming formulations of discounted MDPs.
  • Learning from Sampled Experience
  • This module begins the transition from planning to reinforcement learning. In
  • Control, Exploration, and Tabular RL Algorithms
  • This module completes the tabular reinforcement-learning part of Course 1. Module 4 introduced sample-based learning through bandits and Monte Carlo methods. Module 5 introduces temporal-difference learning: updating after one sampled transition by combining an observed reward with a bootstrapped value estimate. The module ends by summarizing tabular reinforcement learning and motivating the transition to function approximation and deep RL.

Taught by

Ashutosh Trivedi


Subjects

Computer Science