The most comprehensive RL overview I've ever seen.
Kevin Murphy from Google DeepMind, who has over 128k citations, wrote this.
What makes this different from other RL resources:
→ It bridges classical RL with the modern LLM era:
There's an entire chapter dedicated to "LLMs and RL" covering:
- RLHF, RLAIF, and reward modeling
- PPO, GRPO, DPO, RLOO, REINFORCE++
- Training reasoning models
- Multi-turn RL for agents
- Test-time compute scaling
→ The fundamentals are crystal clear
Every major algorithm, like value-based methods, policy gradients, and actor-critic are explained with mathematical rigor.
→ Model-based RL and world models get proper coverage
Covers Dreamer, MuZero, MCTS, and beyond, which is exactly where the field is heading.
→ Multi-agent RL section
Game theory, Nash equilibrium, and MARL for LLM agents.
I have shared the arXiv paper in the replies!