Register and share your invite link to earn from video plays and referrals.

Stephen Roller
@stephenroller
MTS @thinkymachines. previously pre-training @googledeepmind, @character_ai, and @aiatmeta.
Joined February 2008
1.3K Following    5.7K Followers
Some teams use sweeps, heuristics, or scaling laws to determine their training LR. At Character, we just have Noam Shazeer dial it to the right value.