1) Full end-to-end AI software automation is near.
At Nexus, we have the goal of achieving full end-to-end automation of our software production system by end of year. We are experimenting with this via a fast iteration closed-loop self-improving system. I think it's quite interesting, so I want to share publicly as I haven't seen too many engineering teams talk publicly about findings like this yet.
In our tests, each autonomous improvement cycle has the following structure:
- 1. Pre-planning: High-level architectural goals
- 2. Planning: Medium-level planning and spec authoring, including specs for all tests, benchmarks, regression tests, etc.
- 3. Execution: Full end-to-end agentic implementation cycle spanning 1-2hrs of autonomous work. This is the bulk of the productive session where the actual code is written, tested, and progressed.
- 4. Evaluation: Complete running of tests, benchmarks, telemetry and profiling of the post-execution state system
- 5. Retrospective: Thorough retrospective of the end-to-end run, including telemetry of the agent itself, such as cost, tokens used, times context limits were hit, etc.
- 6. Self-improvement: Reasoning based on execution metrics and meta-level metrics about the agentic system itself
The view below shows a 3D visualization of 29 generations of a self-improving codebase authored by our E2E automation system