가입 후 초대 링크를 공유하면 동영상 재생 및 초대 보상을 받을 수 있습니다.

Lisan al Gaib
@scaling01
lead them to paradise LisanBench: Impressum & Datenschutz:
가입 August 2024
1K 팔로잉 중    43.9K
new forecasting benchmark: FutureSim GPT-5.5 performs the best at 25%, but Mythos, Gemini 3.1 Pro and Opus 4.7 are not included. Based on their Brier Skill Score the models don't seem to be much better than just assigning equal probabilities to all outcomes
더 보기