註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Haider.
@haider1
together, we build an intelligent future.
加入 November 2021
3.8K 正在關注    66.3K 粉絲
really cool benchmark for long-horizon test-time adaptation gpt-5.5 in codex leads on FutureSim, where agents interact with a chronological replay of real-world news and are tasked with predicting future events on some Polymarket questions, gpt-5.5 even moved ahead of the human market aggregate interestingly, gemini 3.1 and opus 4.7 are missing
顯示更多
0
10
49
10
轉發到社區