注册并分享邀请链接,可获得视频播放与邀请奖励。

Jinjie Ni
@NiJinjie
Research Scientist @GoogleDeepMind
加入 April 2020
651 正在关注    3.6K 粉丝
Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens hits 56% HellaSwag & 33% MMLU — no tricks, no cherry-picks. > No saturation: more repeats = more gains. 🚨 ” We also dissected the serious methodological flaws in our parallel work “Diffusion Beats Autoregressive in Data-Constrained Settings” — let’s raise the bar for open review! 🔗 Blog & details: 18 🧵s ahead:
显示更多
0
42
1.6K
252
转发到社区