vLLM tops the Artificial Analysis leaderboard ๐
vLLM tops
@ArtificialAnlys on DeepSeek V3.2 and ranks among the top deployments of MiniMax-M2.5 and Qwen 3.5 397B.
The leading deployments of these models are now open source.
How each result was built:
๐น DeepSeek V3.2 โ Aggressive op fusion across the attention path collapsed ~33 per-layer kernels down toward ~10.
๐น MiniMax-M2.5 โ Custom EAGLE3 draft trained against the target's own token distribution via TorchSpec, plus a custom QK-norm fusion for MiniMax's TP-aware attention.
๐น Qwen 3.5 397B โ Targeted fusions plus a QK-norm fix for Qwen's linear-attention path.
Every optimization is in vLLM main or on its way upstream.
Huge thank you to
@inferact,
@digitalocean,
@nvidia,
@RedHat_AI, and the vLLM community ๐
Full breakdown ๐