Was Sie vorher wissen sollten
bevor Sie beginnen

Beginnt 1 July 2026 10:09

Endet 1 July 2026

00 Tage
00 Stunden
00 Minuten
00 Sekunden
course image

Deep Reinforcement Learning: From Theory to Practice

Master deep reinforcement learning from function approximation to modern algorithms like PPO, DDPG, and SAC, implementing stable agents for high-dimensional environments including games and robotics.
University of Colorado Boulder via Coursera

University of Colorado Boulder

40 Kurse


The University of Colorado Boulder, often referred to as CU Boulder, offers a wide range of educational programs and courses, both in-person and online. Students can choose from a variety of study tracks, including courses in the arts, science, engineering, business and more.

Online University of Colorado Boulder Сourses

One of the key benefits of CU Boulder is the ability to take courses online. This is a great opportunity for students who want a quality education but prefer flexibility in their schedule. CU Boulder's online courses provide access to highly qualified faculty and the most up-to-date materials.

Summer courses at the University of Colorado Boulder

Summer is a great time to explore new topics and expand your knowledge. The University of Colorado Boulder offers a variety of summer courses both online and in-person. This is an excellent opportunity for students to spend their summer usefully by studying subjects of interest.

Best CU Boulder Courses for Students

CU Boulder not only offers a wide variety of programs, but also a high-quality education. Students can choose from a variety of courses, from basic to advanced, to develop their skills and interests. The university actively uses innovative approaches to teaching, such as AI Education, which helps students gain up-to-date knowledge.

Courses at CU Boulder: More with Free Lessons

With a variety of free courses at CU Boulder, students can expand their knowledge in a variety of areas. These courses are available for both beginners and advanced students, allowing everyone to find a suitable training option to suit their interests.

Pluses of studying online courses at the University of Colorado Boulder

The University of Colorado Boulder provides students with a unique opportunity to study through online courses, which has a number of significant advantages.

Firstly, the advantage of studying online courses at the University of Colorado Boulder is the flexible schedule. Students can choose their own time to study material and watch lectures, making it easier for them to balance their studies with other responsibilities such as work or family commitments. This flexibility makes education more accessible to a wider range of people.

Secondly, University of Colorado Boulder courses provide students with the opportunity to study unique material presented by experienced teachers. Through access to experts in various fields of knowledge, students can gain relevant knowledge and skills that will be useful in the modern world.

The third benefit of taking CU Boulder online courses is the opportunity to connect and collaborate with other students from different countries and cultures. This contributes to an enriching educational experience by allowing students to be exposed to different points of view and broaden their horizons.

Additionally, University of Colorado Boulder online courses typically offer a variety of interactive learning materials, making the learning process more fun and effective. Students can learn through video lectures, tests, forums, and other innovative methods that stimulate learning.

Thus, taking University of Colorado Boulder online courses offers students many benefits, including flexible scheduling, access to experts, international communication, and an interactive educational approach. This is an excellent opportunity for students to receive a quality education, expand their knowledge and skills, and prepare for the challenges of the modern world.

Conclusion

The University of Colorado Boulder is a place where students can receive a quality education with a variety of courses and programs to choose from. Whether you're looking for online or in-person training, summer courses or free programs, CU Boulder offers ample opportunities for development and learning!

6 weeks, 2 hours a week

Optionales Upgrade verfügbar

Mittelstufe

Lernen Sie in Ihrem eigenen Tempo

Paid Course

Optionales Upgrade verfügbar

Übersicht

How can reinforcement learning scale beyond small tabular problems to high-dimensional environments such as games, robotics, and autonomous decision-making? This course introduces deep reinforcement learning, where reinforcement-learning algorithms are combined with neural-network-based function approximation.

Learners begin by studying why tabular methods break down in large or continuous state spaces and how value functions, action-value functions, and policies can be represented by parameterized models. The course then develops value-based deep reinforcement learning methods, including fitted value iteration, Deep Q-Networks, replay buffers, target networks, Double DQN, dueling networks, and prioritized experience replay.

Learners also study direct policy optimization through policy-gradient methods such as REINFORCE, as well as actor–critic methods that combine policy optimization with value estimation. The course introduces selected modern deep RL algorithms, such as PPO, DDPG, and SAC, with emphasis on implementation, stability, diagnosis, and empirical evaluation.

By the end of the course, learners will be able to implement deep reinforcement-learning agents, diagnose common sources of instability, evaluate learned behavior using suitable experimental protocols, and report results in a reproducible way. This course can be taken for academic credit as part of CU Boulder’s Masters of Science in Computer Science (MS-CS) and Master of Science in Artificial Intelligence (MS-AI) degrees offered on the Coursera platform.

These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history.

CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more:

MS in Artificial Intelligence:

https:

//www.coursera.org/degrees/ms-artificial-intelligence-boulder MS in Computer Science:

https:

//coursera.org/degrees/ms-computer-science-boulder

Lehrplan

  • Function Approximation for RL
  • This module introduces function approximation as the transition point from tabular reinforcement learning to deep reinforcement learning. In Course 1, we represented values explicitly using tables: V (s), Q(s, a). This works when the state and action spaces are small enough to enumerate. But many reinforcement-learning problems have large, continuous, high-dimensional, or image-like observations. In such settings, tables are not enough. Course 2 replaces tables by parameterized functions: Vθ(s), Qθ(s, a), πθ(a | s). The parameter vector θ may represent a linear model, a neural network, or another differentiable function class. The central question of this module is: How do we learn value functions when tables are too large? The module also explains why deep RL is not merely supervised learning applied to RL data. The targets are noisy, bootstrapped, policy-dependent, and often moving as the parameters change. These difficulties lead to the deadly triad: function approximation, bootstrapping, and off-policy learning. The module ends with fitted value iteration as a bridge from tabular value iteration to deep Q-learning.
  • Deep Q-Learning and Value-Based Deep RL
  • This module develops value-based deep reinforcement learning as bootstrapped regression. In the previous module, we replaced tabular value functions by parameterized functions: Vθ(s), Qθ(s, a), πθ(a | s). We also saw that function approximation changes the learning problem: values are no longer stored independently, targets can move as parameters change, and bootstrapped updates can become unstable. This module applies these ideas to deep action-value learning. We begin with fitted value iteration, which turns Bellman updates into regression problems. We then study Deep Q-Networks, or DQN, where a neural network represents Qθ(s, a). DQN combines Q-learning targets with two important stabilizers: replay buffers and target networks. Finally, we study common DQN variants: Double DQN, dueling networks, and prioritized replay. The goal is to understand DQN not as a mysterious deep-learning recipe, but as Q-learning plus function approximation, bootstrapped targets, replay, and stabilization.
  • Policy Gradients and REINFORCE
  • This module introduces policy-gradient methods, a family of reinforcement-learning algorithms that optimize a parameterized policy directly rather than deriving behavior from a learned value function. Starting from the motivation for direct policy learning, the module develops the policy-gradient objective, the score-function trick that makes this objective differentiable from sampled experience, and REINFORCE, the foundational Monte Carlo policy-gradient algorithm. The module then introduces baselines as a practical variance-reduction technique and closes by motivating actor-critic methods as the natural next step once a learned baseline is introduced.
  • Actor-Critic Methods
  • REINFORCE updates the policy directly from sampled Monte Carlo returns, but those returns are noisy — the same policy can produce wildly different outcomes from episode to episode. This module introduces actor–critic methods, which tame that variance by learning a second component, the critic, that estimates how good a state or action is and feeds that estimate back into the policy update as a baseline. Learners will see how subtracting a learned value function from the return produces an advantage signal, how that signal generalizes from the one-step TD error to the multi-step Generalized Advantage Estimator, and how actor and critic are jointly trained via separate policy and value losses. The module closes by tracing the conceptual line from basic actor–critic methods to PPO, motivating why controlling the size of policy updates matters for stable learning.
  • Modern Deep RL: PPO, DDPG, and SAC
  • This module surveys modern deep reinforcement learning algorithms through the lens of stability, exploration, and continuous control. In the previous module, we studied policy-gradient and actor–critic methods. Vanilla policy-gradient updates can be brittle. If the policy changes too much after one update, the new policy may perform much worse than the old one, and the data collected under the old policy may no longer be reliable for updating the new one. This module studies three major algorithmic ideas. First, we study conservative policy updates through TRPO and PPO. The main idea is to improve the policy while preventing overly large policy changes. PPO implements this idea using a simple clipped surrogate objective. Second, we study DDPG, a deterministic actor–critic method for continuous-control problems. Third, we study SAC, an entropy-regularized actor–critic method that encourages exploration and often improves robustness.
  • Practical Deep RL Implementation
  • This module turns deep reinforcement learning algorithms into implementation patterns. Earlier modules introduced the main algorithmic ideas: function approximation, DQN, policy gradients, actor–critic methods, PPO, DDPG, and SAC. This module asks how those ideas become working code. A deep RL implementation is not just a neural-network training loop. In supervised learning, the data are usually given in a fixed dataset. In reinforcement learning, the data are generated by an agent interacting with an environment. This means the implementation must manage environment interaction, exploration, neural-network models, optimizers, replay buffers or trajectory buffers, target networks, logging, evaluation, and reproducibility.

Unterrichtet von

Ashutosh Trivedi


Fachgebiete

Computer Science