I’ve left Google DeepMind after an amazing chapter.
I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale.
As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals.
We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations.
Show more
New Google paper: A forecast needs context, not just history.
Some patterns are caused by events, not time. Nexus reframes forecasting as a reasoning problem, where events and numbers have to explain each other.
Nexus argues that forecasting improves when models read the world around the numbers, not just the numbers themselves.
In the Zillow tests, one Claude-based version cut average MAPE by 86.6% versus direct chain-of-thought prompting.
That matters because most time series models are fluent in pattern, but mute about cause.
A housing inventory curve can reflect seasonality, mortgage pressure, migration, layoffs, and local supply, while a stock price can be bent by earnings, regulation, hype, and fear.
Nexus separates those jobs instead of asking one prompt to do everything.
One agent turns messy historical text into a clean event timeline, one reads the broad regime, another tracks local shocks, and a synthesizer reconciles them with calibration from past errors.
The interesting result is not merely that context helps, but that structure helps the language model use context without losing the time series.
The evidence is still narrow: Zillow counts, seven equities, post-cutoff data, and single-run evaluations, so this is not a universal law of forecasting.
But the direction is clear: future forecasters will not only extrapolate curves; they will argue about what made the curve move.
----
Paper Link – arxiv. org/abs/2605.14389
Paper Title: "Nexus : An Agentic Framework for Time Series Forecasting"
Show more
NVIDIA just unleashed SANA-WM and it’s an absolute MONSTER for the future of open source AI!
A blazing-fast 2.6B-parameter open-source world model that doesn’t just generate video… it creates controllable, physics-rich, high-fidelity worlds on demand.
Why this is insanely powerful:
• One image + text prompt + 6-DoF camera trajectory → generates 720p videos up to 60 seconds long with buttery-smooth, precisely controlled camera movement. You’re not just watching, you’re piloting the simulation.
• Runs locally on a single consumer GPU (RTX 5090 level) thanks to heavy distillation + NVFP4 quantization. Full 60-second clip denoised in ~34 seconds. No massive clusters required.
• 36× higher throughput than previous open models while rivaling (or beating) closed industrial giants in visual quality and consistency.
• Trained lightning-fast: ~213K public videos in just 15 days on 64 H100s.
• Built with next-level tech: Hybrid Linear Attention, dual-branch camera control, two-stage pipeline, and rock-solid metric-scale pose understanding.
This is a true open world model, the foundation for embodied AI, robotics, autonomous systems, and hyper-realistic simulations that can run anywhere.
Project:
At our Zero-Human Company, we’re already running SANA-WM live in our core pipelines. It’s supercharging autonomous agent training, generating unlimited synthetic training data, and powering full end-to-end simulation loops, zero humans in the loop.
The speed and control let us test thousands of edge-case scenarios overnight, iterate at lightspeed, and push our fully autonomous operations further than ever before.
This is the kind of breakthrough that turns science fiction into daily reality. World models just leveled up — hard.
The age of personal, local, controllable universes is here.
Show more
In-context learning in LLMs
@OpenAIDevs Tired of your codex agent dying mid-task? Hire a Goblid™, the goblin in your hinge that won't let the lid close
We are happy to share early results from Logos, our novel first-principles augmented intelligence system, that has enabled insightful results across domains.
We start with series of results in physics
Today's is a lovely result hiding in Special Relativity for 121 years.
Show more