Jordan Nanos(@JordanNanos):cool idea from DeepSeek in their DualPath paper! instead of loading all KV's directly onto GPUs from local NVMe (or DRAM) and bottlenecking on the local PCIe bus, they can stage the KV's in the DRAM on the decode GPU servers, and then transfer the KV's to the prefill GPUs via GDRDMA

Jordan Nanos

@JordanNanos

Member of Technical Staff @SemiAnalysis_

加入 December 2017

838 正在關注 3.3K 粉絲

Jordan Nanos@JordanNanos

2026.02.26 20:57

cool idea from DeepSeek in their DualPath paper! instead of loading all KV's directly onto GPUs from local NVMe (or DRAM) and bottlenecking on the local PCIe bus, they can stage the KV's in the DRAM on the decode GPU servers, and then transfer the KV's to the prefill GPUs via GDRDMA