cv usk(@cv_usk):🖼 Test-time scaling for image editing tends to hand every edit the same compute budget, wasting a lot of it. By allocating budget by difficulty and pruning with edit-specific verification, this work hits up to 2.2x speedup while preserving quality. Title: From Scale to Speed: Adaptive Test-Time Scaling for Image Editing URL: https://t.co/ymNAKCiOQ6 📝 Overview ADE-CoT is a test-time scaling method tailored to goal-directed image editing. Instead of reusing Image-CoT methods built for text-to-image generation, it combines three strategies, difficulty-aware allocation, edit-specific early verification, and opportunistic stopping, to cut compute substantially while preserving quality. ❓ Challenges Solved Prior methods had three mismatches. ・Fixed sampling budgets waste compute on easy edits that barely improve ・General MLLM scores wrongly prune about 40% of samples that start low but ultimately score high ・Large-scale sampling produces redundant identical correct outputs, adding needless compute 💡 Methodology & Proposed Approach ・It reads edit difficulty, giving easy edits a minimal budget and expanding the search for hard ones ・A one-step preview estimates clean latents from noisy intermediates without extra denoising, making early verification reliable ・Grounded SAM2 checks that only the intended region changed, and DINOv2 embeddings remove redundant candidates ・It generates candidates sequentially and stops, via depth-first opportunistic stopping, once enough intent-aligned results are found 🎯 Use Cases It fits complex pose changes, multi-object removal or replacement, fine-grained regional edits, multi-turn editing, and high-quality editing under compute constraints, and is especially valuable where inference cost matters, like a production image-editing API. 📊 Experimental Results ・On GEdit-Bench, FLUX.1 Kontext is 2.2x, BAGEL 1.8x, and Step1X-Edit 2.0x faster than Best-of-N ・Reasoning efficiency more than doubles on a fixed 32-sample budget, and outcome efficiency rises 4.9x, 2.7x, and 2.9x across three benchmarks ・On hard multi-object edits like "remove the person standing next to the lady in white," it fixes the baseline's misidentification #ImageEditing #DiffusionModels

2hours ago

🖼 Test-time scaling for image editing tends to hand every edit the same compute budget, wasting a lot of it. By allocating budget by difficulty and pruning with edit-specific verification, this work hits up to 2.2x speedup while preserving quality. Title: From Scale to Speed: Adaptive Test-Time Scaling for Image Editing URL: 📝 Overview ADE-CoT is a test-time scaling method tailored to goal-directed image editing. Instead of reusing Image-CoT methods built for text-to-image generation, it combines three strategies, difficulty-aware allocation, edit-specific early verification, and opportunistic stopping, to cut compute substantially while preserving quality. ❓ Challenges Solved Prior methods had three mismatches. ・Fixed sampling budgets waste compute on easy edits that barely improve ・General MLLM scores wrongly prune about 40% of samples that start low but ultimately score high ・Large-scale sampling produces redundant identical correct outputs, adding needless compute 💡 Methodology & Proposed Approach ・It reads edit difficulty, giving easy edits a minimal budget and expanding the search for hard ones ・A one-step preview estimates clean latents from noisy intermediates without extra denoising, making early verification reliable ・Grounded SAM2 checks that only the intended region changed, and DINOv2 embeddings remove redundant candidates ・It generates candidates sequentially and stops, via depth-first opportunistic stopping, once enough intent-aligned results are found 🎯 Use Cases It fits complex pose changes, multi-object removal or replacement, fine-grained regional edits, multi-turn editing, and high-quality editing under compute constraints, and is especially valuable where inference cost matters, like a production image-editing API. 📊 Experimental Results ・On GEdit-Bench, FLUX.1 Kontext is 2.2x, BAGEL 1.8x, and Step1X-Edit 2.0x faster than Best-of-N ・Reasoning efficiency more than doubles on a fixed 32-sample budget, and outcome efficiency rises 4.9x, 2.7x, and 2.9x across three benchmarks ・On hard multi-object edits like "remove the person standing next to the lady in white," it fixes the baseline's misidentification #ImageEditing# #DiffusionModels#

显示更多