Bolian Li(@lblaoke ):Scaling up RL training with more data often encounters the performance saturation, which wastes compute. We find that a precisely crafted entropy curve is all you need to avoid performance saturation, and we achieve this purely by rejection sampling. https://t.co/QxNFYW5ZsT

Bolian Li

@lblaoke

PhD Candidate @PurdueCS | Interning @Apple MLR | Reinforcement Learning, Bayesian Deep Learning, Large Language Models

Joined October 2023

183 Following 83 Followers

Bolian Li@lblaoke

2026.05.13 07:05

Scaling up RL training with more data often encounters the performance saturation, which wastes compute. We find that a precisely crafted entropy curve is all you need to avoid performance saturation, and we achieve this purely by rejection sampling.