註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Lisan al Gaib
@scaling01
lead them to paradise LisanBench: Impressum & Datenschutz:
加入 August 2024
1K 正在關注    43.9K 粉絲
new forecasting benchmark: FutureSim GPT-5.5 performs the best at 25%, but Mythos, Gemini 3.1 Pro and Opus 4.7 are not included. Based on their Brier Skill Score the models don't seem to be much better than just assigning equal probabilities to all outcomes
顯示更多
0
10
294
23
轉發到社區