Great work at @baseten running vLLM-Omni in production โ open-source, production-grade, cost-efficient omni-modal serving ๐๏ธ
Multi-stage audio, streaming multi-modal, real-time TTS โ workloads where closed-source APIs have been the default.
โ
We serve Qwen3-TTS on vLLM-Omni at $3 per 1M characters. That's 90% lower in cost than comparable closed-source TTS APIs.
Our engineers optimized a single-replica serving stack to get there. Details on the optimized stack and cost per concurrent stream here.