注册并分享邀请链接,可获得视频播放与邀请奖励。

Haider.
@haider1
together, we build an intelligent future.
加入 November 2021
3.8K 正在关注    66.3K 粉丝
really cool benchmark for long-horizon test-time adaptation gpt-5.5 in codex leads on FutureSim, where agents interact with a chronological replay of real-world news and are tasked with predicting future events on some Polymarket questions, gpt-5.5 even moved ahead of the human market aggregate interestingly, gemini 3.1 and opus 4.7 are missing
显示更多
0
10
49
10
转发到社区