vLLM(@vllm_project):Great work at @baseten running vLLM-Omni in production — open-source, production-grade, cost-efficient omni-modal serving 🎙️ Multi-stage audio, streaming multi-modal, real-time TTS — workloads where closed-source APIs have been the default. →

vLLM

@vllm_project

A high-throughput and memory-efficient inference and serving engine for LLMs. Join to discuss together with the community!

加入 March 2024

36 正在關注 38.6K 粉絲

vLLM@vllm_project

2026.05.15 04:03

Great work at @baseten running vLLM-Omni in production — open-source, production-grade, cost-efficient omni-modal serving 🎙️ Multi-stage audio, streaming multi-modal, real-time TTS — workloads where closed-source APIs have been the default. →

Baseten@baseten

2026.05.14 17:26

We serve Qwen3-TTS on vLLM-Omni at $3 per 1M characters. That's 90% lower in cost than comparable closed-source TTS APIs. Our engineers optimized a single-replica serving stack to get there. Details on the optimized stack and cost per concurrent stream here.