註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Keller Jordan
@kellerjordan0
CIFAR-10 fanatic Pretraining @OpenAI OpCo LLC.
加入 March 2016
428 正在關注    16.9K 粉絲
New modded-NanoGPT optimization benchmark result: @wen_kaiyue has improved upon both the Muon and AdamW baselines, by replacing their weight decay with hyperball optimization. The new record is 3325 steps.
顯示更多
0
7
427
42
轉發到社區