METR(@METR_Evals):We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

METR

@METR_Evals

We work to scientifically measure whether and when AI systems might threaten catastrophic harm to society. Nonprofit.

加入 September 2023

35 正在关注 24.4K 粉丝

METR@METR_Evals

2026.05.08 23:41

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

显示更多