註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Lisan al Gaib
@scaling01
lead them to paradise LisanBench: Impressum & Datenschutz:
加入 August 2024
1K 正在關注    43.9K 粉絲
Since GPT-4o, frontier average scores on METR-Horizon have been remarkably predictable over time. A simple linear fit of average score vs. release date gives R² = 0.984. The relationship between average score and log time horizon is also extremely strong: - p50 horizon: r = 0.998 - p80 horizon: r = 0.992 Claude Mythos scored 85.21%, slightly above the ~83.3% predicted by the pre-Mythos linear trend. The implied doubling time for METR time horizons is still about 103 days, the same value we reported on February 12th, 2026. If current trends continue: - 90% score: July 7, 2026 - implied p50 horizon: 27.5 hours - implied p80 horizon: 4.8 hours - 95% score: September 18, 2026 - implied p50 horizon: 44.9 hours, or 1.9 days - implied p80 horizon: 7.8 hours - 100% score: November 30, 2026 - implied p50 horizon: 73.4 hours, or 3.1 days - implied p80 horizon: 12.8 hours
顯示更多
0
9
231
17
轉發到社區