cv usk(@cv_usk):🪑 Insert an object into an image while specifying its exact 3D orientation and position. DIRECT solves the 3D-pose control that text leaves ambiguous and parameters struggle with, by decomposing visual proxies. Title: Direct 3D-Aware Object Insertion via Decomposed Visual Proxies URL: https://t.co/kztx9c01ip 📝 Overview DIRECT is a diffusion-based method that inserts a reference object into an image with explicit control over its 3D pose and position. It decomposes the insertion condition into geometry, appearance, and context, injected through independent pathways. ❓ Challenges Solved Existing insertion methods formulate the task as 2D inpainting and can't control 3D pose. Text guidance is spatially ambiguous, and parametric 3D methods can't translate abstract parameters into correct geometric projections. 💡 Methodology & Proposed Approach ・A user-manipulated 3D proxy rendered at the target pose provides geometry guidance ・Appearance (the reference's high-fidelity look) and context (background semantics) are injected independently via separate LoRA adapters and positional embeddings to avoid feature entanglement ・TRELLIS lifts the image into a coarse 3D shape, refined with VGGT and 3D Gaussian Splatting ・Built on FLUX.1-Fill, it uses shape-decomposed mask augmentation and progressive-resolution training to avoid overfitting 🎯 Use Cases It fits virtual staging, e-commerce product photography, creative work needing precise spatial control, and photorealistic AR/VR content generation. 📊 Experimental Results ・On the FLUX backbone it reaches PSNR 23.09, LPIPS 0.147, and matching error 17.8, beating baselines on all metrics ・It stays stable across large 0-180 degree pose changes and preserves fine details even under 3D-reconstruction degradation ・Hybrid-data training raised CLIP-I from 0.904 to 0.943 ・For symmetric object orientation, RGB geometry guidance outperformed normal maps #3DGeneration #ImageEditing

2026.06.15 08:51

🪑 Insert an object into an image while specifying its exact 3D orientation and position. DIRECT solves the 3D-pose control that text leaves ambiguous and parameters struggle with, by decomposing visual proxies. Title: Direct 3D-Aware Object Insertion via Decomposed Visual Proxies URL: 📝 Overview DIRECT is a diffusion-based method that inserts a reference object into an image with explicit control over its 3D pose and position. It decomposes the insertion condition into geometry, appearance, and context, injected through independent pathways. ❓ Challenges Solved Existing insertion methods formulate the task as 2D inpainting and can't control 3D pose. Text guidance is spatially ambiguous, and parametric 3D methods can't translate abstract parameters into correct geometric projections. 💡 Methodology & Proposed Approach ・A user-manipulated 3D proxy rendered at the target pose provides geometry guidance ・Appearance (the reference's high-fidelity look) and context (background semantics) are injected independently via separate LoRA adapters and positional embeddings to avoid feature entanglement ・TRELLIS lifts the image into a coarse 3D shape, refined with VGGT and 3D Gaussian Splatting ・Built on FLUX.1-Fill, it uses shape-decomposed mask augmentation and progressive-resolution training to avoid overfitting 🎯 Use Cases It fits virtual staging, e-commerce product photography, creative work needing precise spatial control, and photorealistic AR/VR content generation. 📊 Experimental Results ・On the FLUX backbone it reaches PSNR 23.09, LPIPS 0.147, and matching error 17.8, beating baselines on all metrics ・It stays stable across large 0-180 degree pose changes and preserves fine details even under 3D-reconstruction degradation ・Hybrid-data training raised CLIP-I from 0.904 to 0.943 ・For symmetric object orientation, RGB geometry guidance outperformed normal maps #3DGeneration# #ImageEditing#