注册并分享邀请链接,可获得视频播放与邀请奖励。

METR
@METR_Evals
We work to scientifically measure whether and when AI systems might threaten catastrophic harm to society. Nonprofit.
加入 September 2023
35 正在关注    24.4K 粉丝
We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.
显示更多
0
69
2.1K
248
转发到社区