Since GPT-4o, frontier average scores on METR-Horizon have been remarkably predictable over time.
A simple linear fit of average score vs. release date gives R² = 0.984.
The relationship between average score and log time horizon is also extremely strong:
- p50 horizon: r = 0.998
- p80 horizon: r = 0.992
Claude Mythos scored 85.21%, slightly above the ~83.3% predicted by the pre-Mythos linear trend.
The implied doubling time for METR time horizons is still about 103 days, the same value we reported on February 12th, 2026.
If current trends continue:
- 90% score: July 7, 2026
- implied p50 horizon: 27.5 hours
- implied p80 horizon: 4.8 hours
- 95% score: September 18, 2026
- implied p50 horizon: 44.9 hours, or 1.9 days
- implied p80 horizon: 7.8 hours
- 100% score: November 30, 2026
- implied p50 horizon: 73.4 hours, or 3.1 days
- implied p80 horizon: 12.8 hours
显示更多