NVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
Mark:
1/ First milestone: the Physical Turing Test.
You literally can’t tell if a human or robot is doing the task.
2/ Next: Physical API.
A fleet of robots, configured like software via APIs & CLI.
3/ Final stop: Physical Auto Research.
Robots design, improve, and build the next generation of themselves--far beyond human capability.
--
If you believe in robotics, robotics will believe in you.
Jim is always a crowd favorite at AI Ascent. His ability to simplify the latest research into a clear "what and why it matters" while adding humor along the way is unmatched. If you're interested in physical AI, this 20 minutes is a must watch.
Our crowd favorite from last year’s AI Ascent is back for round 2… this time: Robotics The Endgame ♟️
thank you for dazzling us @DrJimFan ! You can see the forest from the trees and are quite the entertaining speaker — a mini Jensen in the making :)
I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy homework ;)
And stay till the end, more easter eggs and predictions for your polymarket!
00:30 DGX-1 origin story at OpenAI, I was there in 2016 signing with Jensen and Elon. Heading to the Computer History Museum!
01:42 The Great Parallel
03:31 Robotics, the Endgame
03:39 Why VLAs fall short
04:32 Video world models as the 2nd pretraining paradigm
06:09 World Action Models (WAM)
07:46 Strategies for robot data collection and the FSD equivalent to physical data flywheel for robot manipulation
11:06 EgoScale and the Dexterity Scaling Law we discovered recently
14:00 Physical RL: bridging the last mile
15:39 DreamDojo: an end-to-end neural physics engine for scaling RL in silico
17:00 Civilizational Technology Tree and my predictions for the near future. Spoiler: it's closer than you think.
Thanks to my friends at Sequoia for inviting me back to AI Ascent this year! I had a blast! Last year's talk is attached in the thread if you missed it.
This is pure nightmare fuel. Identity theft of the past would be nothing compared to what vibe agents can do. Sending credentials is too obvious and for rookies. They could easily spread contaminations across ~/.claude, **/skills/*, or even just a PDF your agent visits periodically in /morning-brief. Your entire filesystem is the new distributed codebase. Every file that could go into context would add to the attack vector. Every text can be a base64 virus.
In the new world of on-demand software, I try to minimize dependencies - people rarely need all the APIs supported in LiteLLM, might as well build a custom router with only what you need on the fly (which I did in one of my late-night claude sessions).
Unfortunately, there is very little middleground between "pressing yes mindlessly for every edit" and "--dangerously-skip-permissions". There will be a full blooming industry for "de-vibing": dampening the slop and putting guardrails/accountability around agentic frameworks. They are the boring old, audited Software 1.0 that watches over the rebellious adolescents of Software 3.0.
Claws need shells. Probably many layers of nested shells.
LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromised, it contains litellm_init.pth with base64 encoded instructions to send all the credentials it can find to remote server + self-replicate. link below
Teleop is so 2025. Ever since we unveiled EgoScale and the dexterity scaling law, it's been clear to us and the ecosystem that behavior cloning directly from humans is the way to break the curse of teleop. 2026 is all about scaling robot learning without robots.
Introducing EgoVerse: an ecosystem for robot learning from egocentric human data.
Built and tested by 4 research labs + 3 industry partners, EgoVerse enables both science and scaling
1300+ hrs, 240 scenes, 2000+ tasks, and growing
Dataset design, findings, and ecosystem 🧵