Models keep improving on long-horizon tasks, but splitting work across many agents doesn’t suit every problem.
We walk through the setup for a single agent working sequentially on a task where mistakes compound: modeling the early universe.
Read more: