註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

METR
@METR_Evals
We work to scientifically measure whether and when AI systems might threaten catastrophic harm to society. Nonprofit.
加入 September 2023
35 正在關注    24.4K 粉絲
We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.
顯示更多
0
69
2.1K
248
轉發到社區