Lisan al Gaib(@scaling01):new forecasting benchmark: FutureSim GPT-5.5 performs the best at 25%, but Mythos, Gemini 3.1 Pro and Opus 4.7 are not included. Based on their Brier Skill Score the models don't seem to be much better than just assigning equal probabilities to all outcomes

Lisan al Gaib

@scaling01

lead them to paradise LisanBench: Impressum & Datenschutz:

Joined August 2024

1K Following 43.9K Followers

Lisan al Gaib@scaling01

2026.05.16 14:09

new forecasting benchmark: FutureSim GPT-5.5 performs the best at 25%, but Mythos, Gemini 3.1 Pro and Opus 4.7 are not included. Based on their Brier Skill Score the models don't seem to be much better than just assigning equal probabilities to all outcomes