๊ฐ€์ž… ํ›„ ์ดˆ๋Œ€ ๋งํฌ๋ฅผ ๊ณต์œ ํ•˜๋ฉด ๋™์˜์ƒ ์žฌ์ƒ ๋ฐ ์ดˆ๋Œ€ ๋ณด์ƒ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

NVIDIA AI Infrastructure
@NVIDIAAIInfra
AI factories for the era of AI reasoning.
๊ฐ€์ž… November 2009
1.7K ํŒ”๋กœ์ž‰ ์ค‘    63.7K ํŒฌ
๐Ÿ’ก Why did @togethercompute choose NVIDIA Blackwell to serve DeepSeek-V4? Because NVIDIA Blackwell is built for the bottlenecks that matter most in long-context inference: โ†’ KV-cache pressure during decode โ†’ MoE weight bandwidth during prefill A single NVIDIA HGX B200 system can keep DeepSeek-V4โ€™s compressed CSA/HCA/SWA cache layouts resident across many concurrent long-context requests, while native MXFP4 support enables efficient end-to-end quantized inference for V4โ€™s MoE weights. The result? Higher throughput, lower overhead, and optimized serving efficiency at scale.
๋” ๋ณด๊ธฐ