Register and share your invite link to earn from video plays and referrals.

Thinking Machines
@thinkymachines
Thinking, beeping, and booping. @tinkerapi
Joined February 2025
1 Following    149.1K Followers
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost.
Show more
0
61
2.8K
406
Forward to community