註冊並分享邀請連結,可獲得影片播放與邀請獎勵。

cv usk
@cv_usk
AI / Software Research Notes AI Agent, LLMOps, MLOps, Software Architecture
加入 May 2026
238 正在關注    212 粉絲
🧮 Are you just letting your MoE router train on vibes? This paper proposes a mathematically grounded design principle: align router rows with the principal singular direction of their expert matrices. Title: Redesign Mixture-of-Experts Routers with Manifold Power Iteration URL: 📝 Overview MoE efficiently activates only a subset of experts per input, and the router decides which experts to use. This paper argues that aligning each router row with the principal singular direction of its expert matrix better represents token-expert affinity. ❓ Challenges Solved Each router row acts as an "expert proxy" computing similarity, but there was no principled guideline for how to design that proxy vector. There was no clear principle for condensing expert information into a representative vector. 💡 Methodology & Proposed Approach ・The proposed Manifold Power Iteration (MPI) adopts a "Power-then-Retract" paradigm ・It runs power iteration on the router weights to converge toward the principal singular direction ・A retraction operation imposes norm constraints, balancing computational efficiency and training stability ・It also provides a theoretical proof that router rows converge to the principal singular directions 🎯 Use Cases It gives the routing design of large MoE LLMs a principled guideline rather than heuristics, useful when you want to improve expert utilization, such as reducing skew toward particular experts. 📊 Experimental Results ・The authors pretrained MoE models across scales from 1B to 11B parameters and verified that alignment improves effectiveness ・Aligning to the principal singular direction makes expert-activation decisions more effective As MoE becomes a standard component of large LLMs, this is a foundational contribution answering why routing should be designed a certain way. #MoE# #LLM#
顯示更多