Search SOTA on X — X Web Viewer

2026.04.24 04:51

Kimi K2.6 is the new SOTA open model in Vision and Document Arena, with solid gains since Kimi K2.5: - #1# open on Vision Arena (#15# overall), +14 over #2# Kimi K2.5 (Thinking) - #1# open on Document Arena (#8# overall), +9 over K2.5 and on par with proprietary models like Muse Spark and Gemini 3.1 Pro. Huge congrats again to the @Kimi_Moonshot team on the open source progress!

0

12

256

16

Forward to community

Kimi.ai@Kimi_Moonshot

2026.04.21 04:10

Kimi is the current open-source SOTA on Artificial Analysis

Artificial Analysis@ArtificialAnlys

2026.04.21 03:02

Moonshot’s Kimi K2.6 is the new leading open weights model. Kimi K2.6 lands at #4# on the Artificial Analysis Intelligence Index (54) behind only Anthropic, Google, and OpenAI (all 57) Key takeaways: ➤ Increase in performance on agentic tasks: @Kimi_Moonshot's Kimi K2.6 achieves an Elo of 1520 on our GDPval-AA evaluation, which is a marked improvement over Kimi K2.5’s Elo of 1309. GDPval-AA is our leading metric for general agentic performance, measuring the performance on knowledge work tasks such as preparing presentations and analysis. Models are given code execution and web browsing tools in an agentic loop via our open source reference agentic harness called Stirrup. This continues Kimi K2.6’s strength in tool use, maintaining a 96% score on τ²-Bench Telecom, placing it among other frontier models in this category. ➤ Low hallucination rate: Kimi K2.5 scores 6 on the AA-Omniscience Index, our knowledge evaluation measuring both accuracy and hallucination rate. This score is primarily driven by a comparatively low hallucination rate of 39% (reduced from Kimi K2.5’s 65%), indicating a greater capability to abstain rather than fabricate knowledge when the model is uncertain. Kimi K2.6’s low hallucination rate places it similarly to other models such as Claude Opus 4.7 (36%) and MiniMax-M2.7 (34%) ➤ High token usage: Kimi K2.6 demonstrates high token usage, but is in line with other frontier models in the same intelligence tier. To run the full Artificial Analysis Intelligence Index, Kimi K2.6 used ~160M reasoning tokens. This is slightly lower than Claude Sonnet 4.6 (~190M reasoning tokens) but much higher than GPT 5.4 (~110M reasoning tokens). ➤ Open weights: Kimi K2.6 is a Mixture-of-Experts (MoE) model with 1T total parameters and 32B active, same as the previous two generations of models Kimi K2 Thinking and Kimi K2.5. Kimi K2.6 again pushes the open weights frontier in intelligence. ➤ Third Party Access: Kimi K2.6 is accessible through Moonshot’s First Party API as well as third party API providers Novita, Baseten, Fireworks, and Parasail ➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Further analysis in the threads below.

0

28

906

45

Forward to community

Qoder@qoder_ai_ide

2026.04.20 15:50

@Kimi_Moonshot Kimi-K2.6 just dropped in Qoder. SOTA coding · Long-horizon execution · Agent swarms At 0.3x credits.

0

5

62

4

Forward to community

Kimi.ai@Kimi_Moonshot

2026.04.20 15:28

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: - 🔗 API: 🔗 Tech blog: 🔗 Weights & code:

0

929

18.1K

2.4K

Forward to community

Instagram@instagram

2025.09.18 16:25

Watch and prepare to be amazed at how magician Sean Sotaridona pulls off his latest trick with the help of Meta AI 🎩🪄🤯

0

64

103

11

Forward to community

Jiayi Weng@Trinkle23897

2026.05.08 03:49

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm.

0

61

1.4K

231

Forward to community

Tether@tether

2026.05.07 12:04

Tether Unveils Medical AI That Runs on Phones, Outperforms Much Larger SoTA Models, and Can Cut the Cloud Out Entirely Read more:

0

12

154

22

Forward to community

Roberto Nickson@rpnickson

2026.04.08 16:13

Meta just announced Muse Spark - the first model release from Meta Superintelligence Labs. It isn't SOTA, but very competitive with the leading models across a lot of important benchmarks. For example, Muse Spark Contemplating mode scored 50.4% on HLE with tools, compared to 52.1 for Opus 4.6 and 52.1 for GPT-5.4. It's available to try today in the Meta AI app, which just got a facelift.

0

1

25

0

Forward to community

Bindu Reddy@bindureddy

2026.05.10 01:33

It’s no longer a given that the next generation model will be better - Opus 4.7 is legit worse than 4.6 - Gemini 3.1 worse than 2.5 - Sonnet 4.6 buggier than 4.5 The SOTA models are beginning to run around in circles

0

123

542

44

Forward to community

Alibaba Cloud@alibaba_cloud

2026.05.08 09:21

Smart Studio: Self-host the latest AI 🚀 Stop jumping between platforms. Everything you need to test and serve models is now in one place: ✅ Instant SOTA Access: Run Qwen3.6-Max, DeepSeek-v4, and the latest models the moment they drop. ✅ Full Multimodal Support: Access multimodal and Image & Video generation models. ✅ Visual Model Lab: Compare open vs. closed-source outputs side-by-side. ✅ HF-to-API in Minutes: Turn Hugging Face model into live API in minutes. 🔗: #AlibabaCloud# #SmartStudio# #ModelExploration# #GenAI# #AInnovation# #LLM#

0

9

69

23

Forward to community