This series of posts intends to provide a TL;DR for every chapter of the 2nd edition of the book Reinforcement Learning: An Introduction by Sutton & Barto, published in 2018.

All the posts will follow the book’s structure in bullet points, sometimes with additional explanations. This series should be accessible to anyone interested in Reinforcement Learning, and is particularly useful for people that already have read the book and need a refresher on a specific topic without having to re-read everything.

It comes from a personal observation that sometimes, all we need to get back into a topic is to read the headlines of a chapter.

This post provides an entrypoint for all the per-chapter posts:

Tabular methods

chapter 01 - Introduction
chapter 02 - Multi-armed bandits
chapter 03 - Finite Markov decision processes
chapter 04 - Dynamic programming
chapter 05 - Monte Carlo methods
chapter 06 - Temporal Difference Learning
chapter 07 - N-step bootstrapping
chapter 08 - Panning and learning with tabular methods

Approximate methods

chapter 09 - On-policy prediction
chapter 10 - On-policy control
chapter 11 - Off-policy methods for approximation
chapter 12 - Eligibility Traces
chapter 13 - Policy methods

I will not cover part 3 “going further”, because it treats subjects like psychology and the link between RL and the actual human brain, which I think are better if read directly from the book. Summarizing these would probably denature the thinking spawned by reading the original material.