🎨 The reason diffusion models lose quality during generation turns out to be an "SNR vs timestep" mismatch. A training-free correction cuts FID by up to 47%.
Title: Elucidating the SNR-t Bias of Diffusion Probabilistic Models
URL:
📝 Overview
During training, a diffusion model's signal-to-noise ratio (SNR) is deterministically tied to the timestep. During generation, accumulated errors break that coupling, so a sample's SNR no longer matches its assigned timestep, an "SNR-t bias." This paper elucidates the mechanism and proposes a correction called DCW.
❓ Challenges Solved
Reverse-denoising samples consistently have lower SNR than forward samples at the same timestep. As a result, the network systematically overestimates its outputs, degrading generation quality.
💡 Methodology & Proposed Approach
・At each denoising step it applies a differential correction using the difference between the predicted and reconstructed samples
・It works in the wavelet domain, leveraging how diffusion models reconstruct low frequencies first and high-frequency detail later
・A discrete wavelet transform splits samples into frequency subbands, with low-frequency weights decaying over time and high-frequency weights increasing
・It is training-free and plug-and-play, working with IDDPM, EDM, DDIM, FLUX, and many others
🎯 Use Cases
It can boost the quality of existing pretrained diffusion models after the fact, ideal when you want high-quality generation with few sampling steps.
📊 Experimental Results
・On IDDPM (CIFAR-10) it cuts 20-step FID by 42.6%
・On EDM (CIFAR-10) it reduces FID by 47.1% / 47.4% / 36.4% at 13/21/35 NFE
・It adds further gains even on top of SOTA bias-correction methods (A-DPM-FR improves from 12.38 to 10.91 FID at 10 steps)
・Compute overhead is tiny: ~0.47% on CelebA and 0.08% on ImageNet
#
DiffusionModels# #
GenerativeAI#