Search WorldModel on X — X Web Viewer

2026.06.15 21:41

🌍 "When can self-supervised learning recover the world's true structure?" This theory paper from LeCun and colleagues proves the answer is: only when the latent variables are Gaussian. Title: When Does LeJEPA Learn a World Model? URL: 💡 Overview The paper pins down when LeJEPA (JEPA + Gaussian regularization SIGReg + alignment) can recover the world's latent variables linearly, up to rotation, from nonlinear observations. The key condition: the latents are Gaussian and evolve under an OU process. ⚠️ The problem If a representation distorts the world's true degrees of freedom, reliable planning and compositional generalization break down. It was unclear when self-supervised learning provably recovers world structure. 🛠 Approach and core insight ・The optimal representation extracts the "slowest features" of the latent process, ordered by eigenvalue ・Via Hermite polynomials and Mehler's formula, cross-view correlation decays as ρ^d for degree-d nonlinearity ・So alignment penalizes every degree of nonlinearity, making the linear map the unique optimum ・With linear identifiability, planning in latent space yields the same optimal actions as the true world (directly usable for control) ・Conversely, demanding the optimum always be linear forces the latent distribution to be Gaussian (uniqueness) 📊 Results ・SIGReg and VICReg keep R² > 0.999 for linear recovery up to 1024 dimensions ・Sweeping the generalized-normal family, R² peaks sharply at α=2 (Gaussian) ・In pixel-based robot control, Gaussian OU pairs hit R²=0.95, while non-Gaussian real trajectories stay at R²≤0.5 ・Control cost tracks R² monotonically, and the Gaussian encoder is oracle-level #WorldModels# #SelfSupervisedLearning#

0

Forward to community

cv usk@cv_usk

2026.06.14 09:22

Interactive video world models that generate footage as you control them — here's a unified benchmark that finally measures them fairly 🎮 Title: WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation URL: 🎮 Overview WBench is a unified framework for comprehensively evaluating interactive video world models. With 289 test cases and 1,058 interaction turns, it unifies text, 6-DoF pose, and discrete-action control so models with different native inputs can be compared on equal footing. ❓ Challenges Solved Interactive world models are advancing fast, but there was no comprehensive standard to assess them. Existing benchmarks only partially covered the needed competencies, and differing input interfaces made apples-to-apples comparison hard. 💡 Methodology & Proposed Approach Evaluation spans five core dimensions. ・Video quality ・Setting adherence ・Interaction adherence ・Consistency ・Physics compliance Tasks cover navigation, subject action, event editing, and perspective switching. It uses 22 automatic sub-metrics combining specialist vision models with large multimodal models, all validated against human judgments. 📊 Experimental Results Analyzing 20 state-of-the-art models revealed that no single model performs strongly across all dimensions, exposing characteristic strengths, weaknesses, and persistent challenges across approaches. #WorldModels# #Benchmark#

0

Forward to community

cv usk@cv_usk

2026.06.13 11:29

🕶️ Walk through a first-person world with your own body motion, and explicitly specify what exists at a given location with an image and pose, including how it evolves over time. Meet AnchorWorld, an embodied egocentric world model. Title: AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization URL: 📝 Overview AnchorWorld generates first-person video controlled by full-body human motion. With "anchor views," it lets you explicitly specify what exists at a given 3D location and how it changes over time. ❓ Challenges Solved Existing world models struggle to supervise full-body motion from egocentric video alone, and define environments only implicitly. They lacked both natural embodied control and localized world customization. 💡 Methodology & Proposed Approach ・Since most of the body is invisible in first person, it uses third-person video as auxiliary supervision to learn body-environment positioning ・An anchor has three parts: an RGB image, a 6-DoF viewpoint pose, and an evolution prompt that specify local appearance and temporal change ・3D RoPE spatially distinguishes multiple anchors, and masked cross-attention enables anchor-specific text control ・It trains in four stages (third-person, first-person, static anchors, dynamic evolution), built on Wan 2.2 TI2V 5B 🎯 Use Cases It applies to embodied VR apps, first-person game environment design, embodied-AI training scenarios, and interactive video generation with localized control. 📊 Experimental Results ・On egocentric static scenes it reaches CLIP-V 0.885 and camera accuracy ATE 0.112m, beating PlayerOne and others ・On egocentric dynamic scenes, text alignment (VideoAlign-TA) is 0.717, far above CaM-Ego's 0.385 ・It generalizes strongly to out-of-distribution UE and real-world scenes with little visual overlap between the initial view and anchors #WorldModel# #EmbodiedAI#

0

Forward to community

Justine Moore@venturetwins

2026.05.19 22:27

I got to "play" a world model in real life. The @GoogleDeepMind folks set up a crazy demo for Genie. You select glowing orbs to represent your scene and character. It loads the world in the model, and you navigate with joysticks like a video game 🕹️

0

6

33

4

Forward to community

xAI@xai

2026.01.18 19:07

3rd Place: GrokWorld uses Grok Imagine as a world model to generate synthetic training data for robots — augmenting or replacing months of manual collection in hours. @apturaai

0

115

2.5K

342

Forward to community

HappyOyster@HappyOysterAI

2026.06.17 09:07

HappyOyster 1.0 is now live! Happy Oyster 1.0 is an open-ended world model product for real-time world creation and interaction. Create your world now at — let's explore together! Directing: Real-time interactions: Chat with virtual companions—every prompt changes the experience. Rewrite story: Pause, rewind, and generate a new path whenever you want. More ways to play: Virtual pets, dress-up, mystery boxes, and hidden interactions waiting to be discovered! Adventure: Explore extraordinary places: From deep ocean floors ruins to oil paintings or surreal dreamscapes. Feel the freedom of movement: Skate, parkour, and wingsuit through dynamic worlds. Open-world interaction: Move freely with WASD controls, jump, hide and battle enemies—just like playing a game! Limited-time rewards: Get FREE credits daily until July 17! Start exploring: The world is your oyster. Open it.

0

139

2.9K

205

Forward to community

Diana@sdianahu

2026.06.07 23:46

beyond model size, the more interesting frontier is a thin layer on top: a coding agent that writes an executable world model, checks it against observations, and compresses it toward the *simplest program that fits. it rides every base-model gain for free. A new s-curve sitting on the bitter lesson *"simplest program" is what ARC-AGI measures in "skill-acquisition efficiency"

0

63

396

26

Forward to community