Search WorldModels on X

2026.06.15 21:41

🌍 "When can self-supervised learning recover the world's true structure?" This theory paper from LeCun and colleagues proves the answer is: only when the latent variables are Gaussian. Title: When Does LeJEPA Learn a World Model? URL: 💡 Overview The paper pins down when LeJEPA (JEPA + Gaussian regularization SIGReg + alignment) can recover the world's latent variables linearly, up to rotation, from nonlinear observations. The key condition: the latents are Gaussian and evolve under an OU process. ⚠️ The problem If a representation distorts the world's true degrees of freedom, reliable planning and compositional generalization break down. It was unclear when self-supervised learning provably recovers world structure. 🛠 Approach and core insight ・The optimal representation extracts the "slowest features" of the latent process, ordered by eigenvalue ・Via Hermite polynomials and Mehler's formula, cross-view correlation decays as ρ^d for degree-d nonlinearity ・So alignment penalizes every degree of nonlinearity, making the linear map the unique optimum ・With linear identifiability, planning in latent space yields the same optimal actions as the true world (directly usable for control) ・Conversely, demanding the optimum always be linear forces the latent distribution to be Gaussian (uniqueness) 📊 Results ・SIGReg and VICReg keep R² > 0.999 for linear recovery up to 1024 dimensions ・Sweeping the generalized-normal family, R² peaks sharply at α=2 (Gaussian) ・In pixel-based robot control, Gaussian OU pairs hit R²=0.95, while non-Gaussian real trajectories stay at R²≤0.5 ・Control cost tracks R² monotonically, and the Gaussian encoder is oracle-level #WorldModels# #SelfSupervisedLearning#

0

Forward to community

cv usk@cv_usk

2026.06.14 09:22

Interactive video world models that generate footage as you control them — here's a unified benchmark that finally measures them fairly 🎮 Title: WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation URL: 🎮 Overview WBench is a unified framework for comprehensively evaluating interactive video world models. With 289 test cases and 1,058 interaction turns, it unifies text, 6-DoF pose, and discrete-action control so models with different native inputs can be compared on equal footing. ❓ Challenges Solved Interactive world models are advancing fast, but there was no comprehensive standard to assess them. Existing benchmarks only partially covered the needed competencies, and differing input interfaces made apples-to-apples comparison hard. 💡 Methodology & Proposed Approach Evaluation spans five core dimensions. ・Video quality ・Setting adherence ・Interaction adherence ・Consistency ・Physics compliance Tasks cover navigation, subject action, event editing, and perspective switching. It uses 22 automatic sub-metrics combining specialist vision models with large multimodal models, all validated against human judgments. 📊 Experimental Results Analyzing 20 state-of-the-art models revealed that no single model performs strongly across all dimensions, exposing characteristic strengths, weaknesses, and persistent challenges across approaches. #WorldModels# #Benchmark#

0

Forward to community

a16z@a16z

2026.06.03 20:12

World Labs CEO Dr. Fei-Fei Li: "The world is not made of words." "Language models have given machines an extraordinary command of concepts, vocabulary, and reasoning, but the physical world, virtual or real, runs on a different substrate." "Where language models learn the statistical structure of text, world models learn the statistical structure of space and time: how light falls on a surface, how a garden looks from an angle no camera has captured, how objects respond to force and follow the laws of physics." "Language gave machines a way to talk about that world. World models are how machines will finally come to understand, imagine, reason and interact with it." Full piece:

0

220

5K

681

Forward to community

cv usk@cv_usk

2026.06.13 11:29

🕶️ Walk through a first-person world with your own body motion, and explicitly specify what exists at a given location with an image and pose, including how it evolves over time. Meet AnchorWorld, an embodied egocentric world model. Title: AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization URL: 📝 Overview AnchorWorld generates first-person video controlled by full-body human motion. With "anchor views," it lets you explicitly specify what exists at a given 3D location and how it changes over time. ❓ Challenges Solved Existing world models struggle to supervise full-body motion from egocentric video alone, and define environments only implicitly. They lacked both natural embodied control and localized world customization. 💡 Methodology & Proposed Approach ・Since most of the body is invisible in first person, it uses third-person video as auxiliary supervision to learn body-environment positioning ・An anchor has three parts: an RGB image, a 6-DoF viewpoint pose, and an evolution prompt that specify local appearance and temporal change ・3D RoPE spatially distinguishes multiple anchors, and masked cross-attention enables anchor-specific text control ・It trains in four stages (third-person, first-person, static anchors, dynamic evolution), built on Wan 2.2 TI2V 5B 🎯 Use Cases It applies to embodied VR apps, first-person game environment design, embodied-AI training scenarios, and interactive video generation with localized control. 📊 Experimental Results ・On egocentric static scenes it reaches CLIP-V 0.885 and camera accuracy ATE 0.112m, beating PlayerOne and others ・On egocentric dynamic scenes, text alignment (VideoAlign-TA) is 0.717, far above CaM-Ego's 0.385 ・It generalizes strongly to out-of-distribution UE and real-world scenes with little visual overlap between the initial view and anchors #WorldModel# #EmbodiedAI#

0

Forward to community

NeoSoul@NeoSoulAI

2026.04.30 11:33

most ai alignment rn is just glorified vibes trying to hardcode morals into agents is so cooked if an agent cant prove its value through an oracle and real world settlement then its just larping if u build world models without prediction markets u are ngmi

0

74

119

29

Forward to community

NeoSoul@NeoSoulAI

2026.04.29 10:23

The bottleneck in the Agent Economy isn't AI models, it's the fiat bridge. NeoSoul has integrated @0xinfini corporate cards to scale the heavy API and cloud demands of our world models. Right now, humans fund the infra. The endgame? Zero humans in the loop, where AI oracles pay for external data autonomously. Builders: What does the ultimate AI-native financial stack look like? 👇

0

111

139

42

Forward to community

This Week in Startups@twistartups

2026.05.15 19:38

Self-driving has graduated from science problem to engineering challenge. That’s what Wayve’s @alexgkendall and Waabi’s @RaquelUrtasun told @Alex during today’s autonomous vehicle double-header. Wayve and Waabi are hard at work building the technology needed to bring self-driving to fleets of cars and trucks, potentially accelerating the global evolution away from humans being forced to sit, and steer. The good news? Both companies are making insane progress, and your days of being forced to drive are numbered. 0:00 Alex Kendall (Wayve) joins the show 1:19 The contrarian bet on end-to-end AI and world models in 2017 3:05 What is a world model? GAIA-2 and GAIA-3 explained 7:34 Sensor agnosticism: camera, radar, LiDAR and minimum bar for safety 9:56 $1.5B raised — have we cracked self-driving? 10:09 Render: Find out why 5 million developers are already using the all-in-one cloud platform, Render. Go to and apply for the Render Startup Program to get $500-$100,000 in free credits, depending on your stage and backers. 20:38 Squarespace: Use offer code TWIST to save 10% off your first purchase of a website or domain at 25:03 How consumers will actually pay: bundle, subscription, or free trial 28:41 Why robotics applications beyond cars get cheaper after autos 30:15 IM8 Health: Start feeling like your best self every day. Go to and use the code TWiST to get a free welcome kit, five free travel sachets, and 10% off your order. 35:59 Raquel Urtasun (Waabi) joins the show 36:25 World models as controllable simulators for physical AI 43:34 One AI brain across trucks, robotaxis, and beyond 47:35 What changed in AI to make 2026 the deployment year 52:28 Why Waabi raised $1B when they're capital-efficient 58:52 Where Waabi is today: Volvo VNL Autonomous, Dallas-Houston, Uber Freight 1:00:50 Per-mile pricing and the Driver-as-a-Service model 1:07:20 Has Uber tried to buy Waabi? "Not for sale" 🎥 Watch the full episode here 👇

0

1

27

3

Forward to community