🔎 LLM agents rewrite a decompiler's unreadable `local_48`-laden code to be readable while preserving function, but a single metric collapses into "gaming." The fix is a multidimensional readability score.
Title: LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics
URL:
📝 Overview
This paper has LLM agents improve the readability of decompiled binaries while keeping functional correctness. The key is QRS, a multidimensional score combining structural validation with three readability sub-metrics.
❓ Challenges Solved
Automated decompilers produce functionally correct but unreadable code. When LLMs try to fix it, without quantitative guidance they lose focus, and optimizing a single metric leads to "gaming" that sacrifices other dimensions.
💡 Methodology & Proposed Approach
・QRS is a structural gate times a composite score, a weighted sum of lexical surprisal, structural simplicity, and idiomatic quality
・Lexical surprisal uses a small code-LLM's perplexity to measure how familiar the code looks
・Structural simplicity uses cyclomatic complexity and nesting depth; idiomatic quality uses clang-tidy anti-pattern checks
・QRS is computed only if the recompiled code reaches at least 0.85 CFG similarity to the original binary in radare2
🎯 Use Cases
It directly speeds up reading decompiler output in malware analysis, vulnerability research, legacy-software comprehension, and patch diffing.
📊 Experimental Results
・On 210 synthetic C binaries, LLM-only reached QRS at least 0.75 in 74.76% of cases, with QRS up +0.420 on average and zero regressions
・Allowing Bash execution raised the rate to 82%, improved QRS by +0.509, and cut iterations from 5.92 to 2.933 (a 43% reduction)
・It empirically shows that going multidimensional avoids Goodhart's Law, "when a measure becomes a target, it stops being a good measure"
#
ReverseEngineering# #
AIAgents#