I am pleased to announce another update to my RL tutorial ( This time I have added code for RLFT for multi-turn LLM agents, using the awesome Tinker library from
@thinkymachines, and the simple ReBN training loop from GEM by
@zzlccc et al. With ~100 lines of simple python running on your laptop, you can train an agent based on Qwen3-4B-Instruct to play "guess the number" in 20 minutes.