NVIDIA AI Infrastructure(@NVIDIAAIInfra):What does it take to serve agentic workloads on trillion-parameter models at 400 tokens per second per user — without trading throughput for latency? The NVIDIA Vera Rubin platform pairs Vera Rubin NVL72 with NVIDIA Groq 3 LPX to deliver low latency on trillion-parameter MoE models with 400K-token context with a 35x higher throughput per megawatt. Learn how the deterministic LPU chip-to-chip (C2C) fabric and extreme co-design address agentic AI's scale-up challenges. ➡️

NVIDIA AI Infrastructure

@NVIDIAAIInfra

AI factories for the era of AI reasoning.

加入 November 2009

1.7K 正在关注 63.7K 粉丝

NVIDIA AI Infrastructure@NVIDIAAIInfra

2026.05.14 21:03

What does it take to serve agentic workloads on trillion-parameter models at 400 tokens per second per user — without trading throughput for latency? The NVIDIA Vera Rubin platform pairs Vera Rubin NVL72 with NVIDIA Groq 3 LPX to deliver low latency on trillion-parameter MoE models with 400K-token context with a 35x higher throughput per megawatt. Learn how the deterministic LPU chip-to-chip (C2C) fabric and extreme co-design address agentic AI's scale-up challenges. ➡️

显示更多