How To AI(@HowToAI_ ):Google has quietly dropped what researchers are calling "Attention Is All You Need V2." And it signals the end of the Transformer era as we know it. In 2017, the original "Attention Is All You Need" paper changed the world by proving that AI doesn't need recurrence, it just needs to pay attention. But today, even the most advanced models like GPT and Gemini suffer from a massive, structural flaw: Catastrophic Forgetting. The moment an AI learns something new, it starts losing what it learned before. It’s why AI "hallucinates" or loses the thread in long conversations. This paper, titled "Nested Learning: The Illusion of Deep Learning Architectures," completely replaces the way AI stores information. The researchers have introduced a paradigm shift called Nested Learning (NL). Here is why this is "V2": For the last decade, we treated AI models as one giant, flat mathematical function. NL proves that a model is actually a set of thousands of smaller, "nested" optimization problems running in parallel. Instead of one giant "memory," each layer has its own internal "context flow." This allows the model to learn new tasks at test-time without overwriting its core intelligence. It moves us past the static Transformer. The new architecture (HOPE) demonstrated 100% stability in long-context memory and "post-training adaptation" that was previously impossible. The technical takeaway is brutal for the competition: Existing deep learning works by compressing information until it breaks. Nested Learning works by organizing information so it can grow forever. We’ve spent 7 years trying to make Transformers bigger. Google figured out how to make them "Nested." The Transformer replaced the RNN in 2017. Nested Learning is here to replace the Transformer in 2026.

2026.05.13 17:15

Google has quietly dropped what researchers are calling "Attention Is All You Need V2." And it signals the end of the Transformer era as we know it. In 2017, the original "Attention Is All You Need" paper changed the world by proving that AI doesn't need recurrence, it just needs to pay attention. But today, even the most advanced models like GPT and Gemini suffer from a massive, structural flaw: Catastrophic Forgetting. The moment an AI learns something new, it starts losing what it learned before. It’s why AI "hallucinates" or loses the thread in long conversations. This paper, titled "Nested Learning: The Illusion of Deep Learning Architectures," completely replaces the way AI stores information. The researchers have introduced a paradigm shift called Nested Learning (NL). Here is why this is "V2": For the last decade, we treated AI models as one giant, flat mathematical function. NL proves that a model is actually a set of thousands of smaller, "nested" optimization problems running in parallel. Instead of one giant "memory," each layer has its own internal "context flow." This allows the model to learn new tasks at test-time without overwriting its core intelligence. It moves us past the static Transformer. The new architecture (HOPE) demonstrated 100% stability in long-context memory and "post-training adaptation" that was previously impossible. The technical takeaway is brutal for the competition: Existing deep learning works by compressing information until it breaks. Nested Learning works by organizing information so it can grow forever. We’ve spent 7 years trying to make Transformers bigger. Google figured out how to make them "Nested." The Transformer replaced the RNN in 2017. Nested Learning is here to replace the Transformer in 2026.

1.7K

305

Forward to community