Jiayi Weng(@Trinkle23897 ):Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm.

Jiayi Weng

@Trinkle23897

MTS @openai, author of the entire post-training RL infra, core contributor of ChatGPT/GPT4/GPT4o etc. 30U30

加入 June 2014

143 正在关注 7.3K 粉丝

Jiayi Weng@Trinkle23897

2026.05.08 03:49

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm.

显示更多

542

转发到社区

热门用户

33.6K 粉丝

844.9K 粉丝

228.9K 粉丝

1.9M 粉丝

101.5K 粉丝

11.8K 粉丝

2.2M 粉丝

49.4K 粉丝

rioko凉凉子♡C106 8/17(日)西え47ab

1.8M 粉丝

1.1M 粉丝

121K 粉丝

368.4K 粉丝

672.1K 粉丝

165.1K 粉丝

131.6K 粉丝