cv usk(@cv_usk):🔎 LLM agents rewrite a decompiler's unreadable `local_48`-laden code to be readable while preserving function, but a single metric collapses into "gaming." The fix is a multidimensional readability score. Title: LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics URL: https://t.co/6VZUepWUeh 📝 Overview This paper has LLM agents improve the readability of decompiled binaries while keeping functional correctness. The key is QRS, a multidimensional score combining structural validation with three readability sub-metrics. ❓ Challenges Solved Automated decompilers produce functionally correct but unreadable code. When LLMs try to fix it, without quantitative guidance they lose focus, and optimizing a single metric leads to "gaming" that sacrifices other dimensions. 💡 Methodology & Proposed Approach ・QRS is a structural gate times a composite score, a weighted sum of lexical surprisal, structural simplicity, and idiomatic quality ・Lexical surprisal uses a small code-LLM's perplexity to measure how familiar the code looks ・Structural simplicity uses cyclomatic complexity and nesting depth; idiomatic quality uses clang-tidy anti-pattern checks ・QRS is computed only if the recompiled code reaches at least 0.85 CFG similarity to the original binary in radare2 🎯 Use Cases It directly speeds up reading decompiler output in malware analysis, vulnerability research, legacy-software comprehension, and patch diffing. 📊 Experimental Results ・On 210 synthetic C binaries, LLM-only reached QRS at least 0.75 in 74.76% of cases, with QRS up +0.420 on average and zero regressions ・Allowing Bash execution raised the rate to 82%, improved QRS by +0.509, and cut iterations from 5.92 to 2.933 (a 43% reduction) ・It empirically shows that going multidimensional avoids Goodhart's Law, "when a measure becomes a target, it stops being a good measure" #ReverseEngineering #AIAgents

2026.06.13 13:30

🔎 LLM agents rewrite a decompiler's unreadable `local_48`-laden code to be readable while preserving function, but a single metric collapses into "gaming." The fix is a multidimensional readability score. Title: LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics URL: 📝 Overview This paper has LLM agents improve the readability of decompiled binaries while keeping functional correctness. The key is QRS, a multidimensional score combining structural validation with three readability sub-metrics. ❓ Challenges Solved Automated decompilers produce functionally correct but unreadable code. When LLMs try to fix it, without quantitative guidance they lose focus, and optimizing a single metric leads to "gaming" that sacrifices other dimensions. 💡 Methodology & Proposed Approach ・QRS is a structural gate times a composite score, a weighted sum of lexical surprisal, structural simplicity, and idiomatic quality ・Lexical surprisal uses a small code-LLM's perplexity to measure how familiar the code looks ・Structural simplicity uses cyclomatic complexity and nesting depth; idiomatic quality uses clang-tidy anti-pattern checks ・QRS is computed only if the recompiled code reaches at least 0.85 CFG similarity to the original binary in radare2 🎯 Use Cases It directly speeds up reading decompiler output in malware analysis, vulnerability research, legacy-software comprehension, and patch diffing. 📊 Experimental Results ・On 210 synthetic C binaries, LLM-only reached QRS at least 0.75 in 74.76% of cases, with QRS up +0.420 on average and zero regressions ・Allowing Bash execution raised the rate to 82%, improved QRS by +0.509, and cut iterations from 5.92 to 2.933 (a 43% reduction) ・It empirically shows that going multidimensional avoids Goodhart's Law, "when a measure becomes a target, it stops being a good measure" #ReverseEngineering# #AIAgents#