注册并分享邀请链接,可获得视频播放与邀请奖励。

Jordan Nanos
@JordanNanos
Member of Technical Staff @SemiAnalysis_
加入 December 2017
838 正在关注    3.3K 粉丝
cool idea from DeepSeek in their DualPath paper! instead of loading all KV's directly onto GPUs from local NVMe (or DRAM) and bottlenecking on the local PCIe bus, they can stage the KV's in the DRAM on the decode GPU servers, and then transfer the KV's to the prefill GPUs via GDRDMA
显示更多
0
5
333
48
转发到社区