Kimi K2.6
@Kimi_Moonshot is the new leading open-weights agent model, landing at #
4# on Claw-Eval (Pass^3: 62.3%).
Key takeaways:
- 👑 Best open-source agent, period: Pass^3 of 62.3% is the highest of any open-weights model, within 8 points of frontier Claude Opus 4.6 (70.4%). Pass
@3 of 80.9% closes most of the gap to closed models.
- 💪Frontier-tier robustness: 94.7 (±0.9) — statistically tied with Claude Sonnet 4.6 (94.6) and Claude Opus 4.6 (94.2). K2.6's agent trajectories no longer collapse under perturbation.
The open-source agent frontier just moved.
Full Leaderboard: