cv usk(@cv_usk):🌍 "When can self-supervised learning recover the world's true structure?" This theory paper from LeCun and colleagues proves the answer is: only when the latent variables are Gaussian. Title: When Does LeJEPA Learn a World Model? URL: https://t.co/zblVHGDR6w 💡 Overview The paper pins down when LeJEPA (JEPA + Gaussian regularization SIGReg + alignment) can recover the world's latent variables linearly, up to rotation, from nonlinear observations. The key condition: the latents are Gaussian and evolve under an OU process. ⚠️ The problem If a representation distorts the world's true degrees of freedom, reliable planning and compositional generalization break down. It was unclear when self-supervised learning provably recovers world structure. 🛠 Approach and core insight ・The optimal representation extracts the "slowest features" of the latent process, ordered by eigenvalue ・Via Hermite polynomials and Mehler's formula, cross-view correlation decays as ρ^d for degree-d nonlinearity ・So alignment penalizes every degree of nonlinearity, making the linear map the unique optimum ・With linear identifiability, planning in latent space yields the same optimal actions as the true world (directly usable for control) ・Conversely, demanding the optimum always be linear forces the latent distribution to be Gaussian (uniqueness) 📊 Results ・SIGReg and VICReg keep R² > 0.999 for linear recovery up to 1024 dimensions ・Sweeping the generalized-normal family, R² peaks sharply at α=2 (Gaussian) ・In pixel-based robot control, Gaussian OU pairs hit R²=0.95, while non-Gaussian real trajectories stay at R²≤0.5 ・Control cost tracks R² monotonically, and the Gaussian encoder is oracle-level #WorldModels #SelfSupervisedLearning

2026.06.15 21:41

🌍 "When can self-supervised learning recover the world's true structure?" This theory paper from LeCun and colleagues proves the answer is: only when the latent variables are Gaussian. Title: When Does LeJEPA Learn a World Model? URL: 💡 Overview The paper pins down when LeJEPA (JEPA + Gaussian regularization SIGReg + alignment) can recover the world's latent variables linearly, up to rotation, from nonlinear observations. The key condition: the latents are Gaussian and evolve under an OU process. ⚠️ The problem If a representation distorts the world's true degrees of freedom, reliable planning and compositional generalization break down. It was unclear when self-supervised learning provably recovers world structure. 🛠 Approach and core insight ・The optimal representation extracts the "slowest features" of the latent process, ordered by eigenvalue ・Via Hermite polynomials and Mehler's formula, cross-view correlation decays as ρ^d for degree-d nonlinearity ・So alignment penalizes every degree of nonlinearity, making the linear map the unique optimum ・With linear identifiability, planning in latent space yields the same optimal actions as the true world (directly usable for control) ・Conversely, demanding the optimum always be linear forces the latent distribution to be Gaussian (uniqueness) 📊 Results ・SIGReg and VICReg keep R² > 0.999 for linear recovery up to 1024 dimensions ・Sweeping the generalized-normal family, R² peaks sharply at α=2 (Gaussian) ・In pixel-based robot control, Gaussian OU pairs hit R²=0.95, while non-Gaussian real trajectories stay at R²≤0.5 ・Control cost tracks R² monotonically, and the Gaussian encoder is oracle-level #WorldModels# #SelfSupervisedLearning#

显示更多