Daniel Marin | nexus.xyz(@danielmarinq ):1) Full end-to-end AI software automation is near. At Nexus, we have the goal of achieving full end-to-end automation of our software production system by end of year. We are experimenting with this via a fast iteration closed-loop self-improving system. I think it's quite interesting, so I want to share publicly as I haven't seen too many engineering teams talk publicly about findings like this yet. In our tests, each autonomous improvement cycle has the following structure: - 1. Pre-planning: High-level architectural goals - 2. Planning: Medium-level planning and spec authoring, including specs for all tests, benchmarks, regression tests, etc. - 3. Execution: Full end-to-end agentic implementation cycle spanning 1-2hrs of autonomous work. This is the bulk of the productive session where the actual code is written, tested, and progressed. - 4. Evaluation: Complete running of tests, benchmarks, telemetry and profiling of the post-execution state system - 5. Retrospective: Thorough retrospective of the end-to-end run, including telemetry of the agent itself, such as cost, tokens used, times context limits were hit, etc. - 6. Self-improvement: Reasoning based on execution metrics and meta-level metrics about the agentic system itself The view below shows a 3D visualization of 29 generations of a self-improving codebase authored by our E2E automation system

2026.05.10 08:09

1) Full end-to-end AI software automation is near. At Nexus, we have the goal of achieving full end-to-end automation of our software production system by end of year. We are experimenting with this via a fast iteration closed-loop self-improving system. I think it's quite interesting, so I want to share publicly as I haven't seen too many engineering teams talk publicly about findings like this yet. In our tests, each autonomous improvement cycle has the following structure: - 1. Pre-planning: High-level architectural goals - 2. Planning: Medium-level planning and spec authoring, including specs for all tests, benchmarks, regression tests, etc. - 3. Execution: Full end-to-end agentic implementation cycle spanning 1-2hrs of autonomous work. This is the bulk of the productive session where the actual code is written, tested, and progressed. - 4. Evaluation: Complete running of tests, benchmarks, telemetry and profiling of the post-execution state system - 5. Retrospective: Thorough retrospective of the end-to-end run, including telemetry of the agent itself, such as cost, tokens used, times context limits were hit, etc. - 6. Self-improvement: Reasoning based on execution metrics and meta-level metrics about the agentic system itself The view below shows a 3D visualization of 29 generations of a self-improving codebase authored by our E2E automation system

319

Forward to community