注册并分享邀请链接,可获得视频播放与邀请奖励。

Lisan al Gaib
@scaling01
lead them to paradise LisanBench: Impressum & Datenschutz:
加入 August 2024
1K 正在关注    43.9K 粉丝
new forecasting benchmark: FutureSim GPT-5.5 performs the best at 25%, but Mythos, Gemini 3.1 Pro and Opus 4.7 are not included. Based on their Brier Skill Score the models don't seem to be much better than just assigning equal probabilities to all outcomes
显示更多
0
10
294
23
转发到社区