Register and share your invite link to earn from video plays and referrals.

vLLM
@vllm_project
A high-throughput and memory-efficient inference and serving engine for LLMs. Join to discuss together with the community!
Joined March 2024
36 Following    38.6K Followers
vLLM tops the Artificial Analysis leaderboard ๐ŸŽ‰ vLLM tops @ArtificialAnlys on DeepSeek V3.2 and ranks among the top deployments of MiniMax-M2.5 and Qwen 3.5 397B. The leading deployments of these models are now open source. How each result was built: ๐Ÿ”น DeepSeek V3.2 โ€” Aggressive op fusion across the attention path collapsed ~33 per-layer kernels down toward ~10. ๐Ÿ”น MiniMax-M2.5 โ€” Custom EAGLE3 draft trained against the target's own token distribution via TorchSpec, plus a custom QK-norm fusion for MiniMax's TP-aware attention. ๐Ÿ”น Qwen 3.5 397B โ€” Targeted fusions plus a QK-norm fix for Qwen's linear-attention path. Every optimization is in vLLM main or on its way upstream. Huge thank you to @inferact, @digitalocean, @nvidia, @RedHat_AI, and the vLLM community ๐Ÿ™ Full breakdown ๐Ÿ‘‡
Show more