Register and share your invite link to earn from video plays and referrals.

Stephen Roller
@stephenroller
MTS @thinkymachines. previously pre-training @googledeepmind, @character_ai, and @aiatmeta.
1.3K Following    5.7K Followers
Some teams use sweeps, heuristics, or scaling laws to determine their training LR. At Character, we just have Noam Shazeer dial it to the right value.