We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks.
GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.
We’ve developed our own inference engine Runtime-Optimized Serving Engine (ROSE) to serve models ranging from embeddings to trillion-parameter LLMs.
With CuTeDSL integrated into our inference engine, Perplexity can build the specialized GPU kernels faster to bring models up to peak performance on NVIDIA Hopper and Blackwell GPUs.
Announcing Personal Computer.
Personal Computer is an always on, local merge with Perplexity Computer that works for you 24/7.
It's personal, secure, and works across your files, apps, and sessions through a continuously running Mac mini.