Register and share your invite link to earn from video plays and referrals.

Bolian Li
@lblaoke
PhD Candidate @PurdueCS | Interning @Apple MLR | Reinforcement Learning, Bayesian Deep Learning, Large Language Models
183 Following    83 Followers
Scaling up RL training with more data often encounters the performance saturation, which wastes compute. We find that a precisely crafted entropy curve is all you need to avoid performance saturation, and we achieve this purely by rejection sampling.
Show more