註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Wentao Guo
@WentaoGuo7
CS PhD student @PrincetonCS, Previously CS MEng + BS @CornellCIS
加入 November 2021
199 正在關注    1K 粉絲
🚀SonicMoE🚀now runs at peak throughput on NVIDIA Blackwell GPUs 😃 54% & 35% higher fwd/bwd TFLOPS than the DeepGEMM baseline and 21% higher fwd TFLOPS than the triton official example. SonicMoE still maintains its minimum activation memory footprint: the same as a dense model with equal activated parameters and independent of expert granularity. We wrote a blogpost on how we leveraged Blackwell features and the software abstraction on QuACK: Work with @MayankMish98, @XinleC295, @istoica05, @tri_dao
顯示更多
0
14
327
59
轉發到社區