注册并分享邀请链接,可获得视频播放与邀请奖励。

Keller Jordan
@kellerjordan0
CIFAR-10 fanatic Pretraining @OpenAI OpCo LLC.
加入 March 2016
428 正在关注    16.9K 粉丝
New modded-NanoGPT optimization benchmark result: @wen_kaiyue has improved upon both the Muon and AdamW baselines, by replacing their weight decay with hyperball optimization. The new record is 3325 steps.
显示更多
0
7
427
42
转发到社区