Lei Li(@_TobiasLee ):Kimi K2.6 @Kimi_Moonshot is the new leading open-weights agent model, landing at #4 on Claw-Eval (Pass^3: 62.3%). Key takeaways: - 👑 Best open-source agent, period: Pass^3 of 62.3% is the highest of any open-weights model, within 8 points of frontier Claude Opus 4.6 (70.4%). Pass@3 of 80.9% closes most of the gap to closed models. - 💪Frontier-tier robustness: 94.7 (±0.9) — statistically tied with Claude Sonnet 4.6 (94.6) and Claude Opus 4.6 (94.2). K2.6's agent trajectories no longer collapse under perturbation. The open-source agent frontier just moved. Full Leaderboard:

2026.04.21 07:35

Kimi K2.6 @Kimi_Moonshot is the new leading open-weights agent model, landing at #4# on Claw-Eval (Pass^3: 62.3%). Key takeaways: - 👑 Best open-source agent, period: Pass^3 of 62.3% is the highest of any open-weights model, within 8 points of frontier Claude Opus 4.6 (70.4%). Pass@3 of 80.9% closes most of the gap to closed models. - 💪Frontier-tier robustness: 94.7 (±0.9) — statistically tied with Claude Sonnet 4.6 (94.6) and Claude Opus 4.6 (94.2). K2.6's agent trajectories no longer collapse under perturbation. The open-source agent frontier just moved. Full Leaderboard: