註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Jiayi Weng
@Trinkle23897
MTS @openai, author of the entire post-training RL infra, core contributor of ChatGPT/GPT4/GPT4o etc. 30U30
加入 June 2014
177 正在關注    11.6K 粉絲
Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm.
顯示更多
0
58
1.4K
229
轉發到社區