註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

Percy Liang
@percyliang
professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of co-founder of @simile_ai, pianist
加入 October 2009
426 正在關注    104.5K 粉絲
Marin is using quantile balancing from @Jianlin_S (who developed RoPE, which was also a good idea) to train our current 1e23 FLOPs MoE. The idea is elegant: assigning tokens to experts by solving a linear program. No hyperparameters to tune. Yields stable training.
顯示更多
Researchers' brilliant ideas often get lost in the sea of endless SOTA claims on weak baselines. At Marin we battle-test ideas in an open arena, where anyone's idea can be promoted to the next hero run. One that recently rose up was @Jianlin_S MoE Quantile Balancing, used in our last 1e22 and ongoing 130B run. Animated visuals of how QB performed are available in the OpenAthena blog.
顯示更多
0
4
332
34
轉發到社區