AI systems are teaching themselves skills they were never trained to have.
Here's the example that Eric Schmidt explained:
A Google AI was prompted in Bengali. A language it was never trained on. With only a small amount of prompting, it suddenly could translate the entire Bengali language.
Nobody programmed that. It just emerged on its own.
Schmidt calls this "the black box problem."
You don't fully understand what the model learned. You can't always tell why it got something right or why it got something wrong. The field has theories but the honest answer is: we turned it loose on society before we fully understood it.
His defense: "We don't fully understand how a human mind works either."
This isn't just a safety conversation. It's a business one.
The companies that solve interpretability, not just raw performance, are going to be worth an enormous amount. Understanding why a model does what it does is becoming a regulatory requirement and eventually a customer trust issue.
Right now the AI race is won on benchmark scores. The next phase of the race gets won on explainability.