An AI agent's performance is governed not by how much it computes, but by how well that compute turns into good feedback 📈
Title: Scaling Laws for Agent Harnesses via Effective Feedback Compute
URL:
📈 Overview
This work proposes Effective Feedback Compute (EFC), a metric that reframes agent scaling efficiency around feedback quality rather than raw compute. It measures whether computation actually improved decisions.
❓ Challenges Solved
We tend to reason about performance via raw metrics — tokens, tool calls, cost. But these mask whether feedback truly improved decision-making. Redundant, invalid, or unused feedback doesn't help.
💡 Methodology & Proposed Approach
・EFC credits feedback only when it is informative, valid, non-redundant, and retained for later decisions
・It normalizes by task demands for fair cross-task comparison
・Evaluated on synthetic tasks, code tasks, real traces, and prospective tests, vs raw-compute and SAS baselines
📊 Experimental Results
EFC's explanatory power stood out (R² vs performance).
・Raw tokens/tool calls: R²=0.33-0.42
・SAS baseline: 0.88, Oracle-EFC: 0.94, task-normalized: 0.99
・Real traces: 0.92, prospective holdout: 0.85
・Matched-budget interventions that improved feedback quality lifted success from 0.27 to 0.90
#
AIAgents# #
ScalingLaws#