注册并分享邀请链接,可获得视频播放与邀请奖励。

cv usk
@cv_usk
AI / Software Research Notes AI Agent, LLMOps, MLOps, Software Architecture
加入 May 2026
240 正在关注    207 粉丝
🌍 "When can self-supervised learning recover the world's true structure?" This theory paper from LeCun and colleagues proves the answer is: only when the latent variables are Gaussian. Title: When Does LeJEPA Learn a World Model? URL: 💡 Overview The paper pins down when LeJEPA (JEPA + Gaussian regularization SIGReg + alignment) can recover the world's latent variables linearly, up to rotation, from nonlinear observations. The key condition: the latents are Gaussian and evolve under an OU process. ⚠️ The problem If a representation distorts the world's true degrees of freedom, reliable planning and compositional generalization break down. It was unclear when self-supervised learning provably recovers world structure. 🛠 Approach and core insight ・The optimal representation extracts the "slowest features" of the latent process, ordered by eigenvalue ・Via Hermite polynomials and Mehler's formula, cross-view correlation decays as ρ^d for degree-d nonlinearity ・So alignment penalizes every degree of nonlinearity, making the linear map the unique optimum ・With linear identifiability, planning in latent space yields the same optimal actions as the true world (directly usable for control) ・Conversely, demanding the optimum always be linear forces the latent distribution to be Gaussian (uniqueness) 📊 Results ・SIGReg and VICReg keep R² > 0.999 for linear recovery up to 1024 dimensions ・Sweeping the generalized-normal family, R² peaks sharply at α=2 (Gaussian) ・In pixel-based robot control, Gaussian OU pairs hit R²=0.95, while non-Gaussian real trajectories stay at R²≤0.5 ・Control cost tracks R² monotonically, and the Gaussian encoder is oracle-level #WorldModels# #SelfSupervisedLearning#
显示更多