註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Jordan Nanos
@JordanNanos
Member of Technical Staff @SemiAnalysis_
加入 December 2017
838 正在關注    3.3K 粉絲
cool idea from DeepSeek in their DualPath paper! instead of loading all KV's directly onto GPUs from local NVMe (or DRAM) and bottlenecking on the local PCIe bus, they can stage the KV's in the DRAM on the decode GPU servers, and then transfer the KV's to the prefill GPUs via GDRDMA
顯示更多
0
5
333
48
轉發到社區