Jiayi Weng(@Trinkle23897 ):Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm.

Jiayi Weng

@Trinkle23897

MTS @openai, author of the entire post-training RL infra, core contributor of ChatGPT/GPT4/GPT4o etc. 30U30

加入 June 2014

177 正在關注 11.6K 粉絲

Jiayi Weng@Trinkle23897

2026.05.08 03:49

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm.

1.4K

229

轉發到社區

熱門用戶

101.5K 粉絲

229K 粉絲

12K 粉絲

168.4K 粉絲

2.2M 粉絲

16.2K 粉絲

121.3K 粉絲

93.3K 粉絲

125.9K 粉絲

12.3K 粉絲

286K 粉絲

107.8K 粉絲

122.5K 粉絲

106.3K 粉絲

90.5K 粉絲