DeepSeek v4 works fine, but it’s not the frontier-pressing moment we saw with Kimi 2.6. On Notion eval data, it’s similar performance to GPT 5.2, with understandable failings.
Most interesting — it doesn’t scale well. It’s ridiculously slow. On multiple major, trusted, and performant US inference providers we see it 15x slower than GPT 5.2 and 2x slower than Opus 4.7, a problem Kimi never had.
Curious if it’s a fundamental issue in architecture, or a matter of time til inference providers make it work. Doesn’t seem urgent either way, if Kimi can outperform. Cheaper maybe, but not groundbreaking.