vLLM(@vllm_project):🚀 vLLM-Omni v0.20.0 is out — aligned with upstream vLLM v0.20.0 (CUDA 13.0 · PyTorch 2.11 · Transformers 5.x). ⚡ Qwen3-Omni throughput +72% on H20, 32 conc (0.241 → 0.414 req/s) via talker / code2wav multi-replica scaling 🎙️ TTS faster & leaner: VoxCPM2 RTF 0.946 → 0.106 · Fish Speech Fast AR latency -53% · Qwen3-TTS / Voxtral-TTS Code2Wav saves ~3.2 GiB 🎨 Diffusion dynamic step-level batching: +7.8% throughput / -5.8% latency 🆕 New / improved: HunyuanImage-3.0, ERNIE T2I, AudioX, Wan2.2-S2V, LTX-2.3, FastGen Wan 2.1 📱 Wan2.2 on NPU production-ready: MindIE-SD, fused ops, VAE BF16, HSDP/USP — +50–60% perf 🧮 Quant expanded: Qwen Omni W4A16, OmniGen2 FP8, Z-Image FP8, HunyuanImage3 NPU, GLM-Image 🧩 Multi-backend updates across CUDA / ROCm / MUSA / NPU / XPU Check it out →

2026.05.08 14:00

🚀 vLLM-Omni v0.20.0 is out — aligned with upstream vLLM v0.20.0 (CUDA 13.0 · PyTorch 2.11 · Transformers 5.x). ⚡ Qwen3-Omni throughput +72% on H20, 32 conc (0.241 → 0.414 req/s) via talker / code2wav multi-replica scaling 🎙️ TTS faster & leaner: VoxCPM2 RTF 0.946 → 0.106 · Fish Speech Fast AR latency -53% · Qwen3-TTS / Voxtral-TTS Code2Wav saves ~3.2 GiB 🎨 Diffusion dynamic step-level batching: +7.8% throughput / -5.8% latency 🆕 New / improved: HunyuanImage-3.0, ERNIE T2I, AudioX, Wan2.2-S2V, LTX-2.3, FastGen Wan 2.1 📱 Wan2.2 on NPU production-ready: MindIE-SD, fused ops, VAE BF16, HSDP/USP — +50–60% perf 🧮 Quant expanded: Qwen Omni W4A16, OmniGen2 FP8, Z-Image FP8, HunyuanImage3 NPU, GLM-Image 🧩 Multi-backend updates across CUDA / ROCm / MUSA / NPU / XPU Check it out →