cv usk(@cv_usk):🏠 Just specify furniture with text or images, and get a style-consistent 3D indoor scene generated automatically, about 85% faster than MMGDreamer. Title: FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow URL: https://t.co/NYsZPmllEX 📝 Overview FlowScene generates high-fidelity 3D indoor scenes from a multimodal scene graph that fuses text and images. It produces layout, shape, and texture in three branches via a straight-line rectified flow, keeping style consistent across the whole scene. ❓ Challenges Solved Language-driven retrieval methods lack object-level control and style coherence, while graph-based methods struggle with high-quality textures. FlowScene resolves both weaknesses at once. 💡 Methodology & Proposed Approach ・It takes a multimodal graph where nodes fuse text descriptions and image features (text-only, image-only, or mixed) ・An InfoExchangeUnit densely exchanges node information during sampling to satisfy both individual and holistic conditions ・Layout (3D boxes), shape (VQ-VAE latents), and texture (anchored to geometry) are generated by independent denoisers ・Texture is denoised with geometry fixed, so even text-only nodes get style-consistent textures through information exchange 🎯 Use Cases It fits interactive scene design for interior design and manufacturing, VR/AR content creation, and building simulation environments for robotics. 📊 Experimental Results ・Bedroom FID improves from 42.38 to 35.01, 17.4% better than MMGDreamer ・CLIPScore of 0.2386 is the best of all methods, and users rate style consistency 8.72/10 ・Inference without textures takes 6.83s, about 85% faster than MMGDreamer's 45.34s ・Object quality also improves, e.g. a 43.90% better minimum matching distance on nightstands #3DGeneration #GenerativeAI

2026.06.13 08:29

🏠 Just specify furniture with text or images, and get a style-consistent 3D indoor scene generated automatically, about 85% faster than MMGDreamer. Title: FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow URL: 📝 Overview FlowScene generates high-fidelity 3D indoor scenes from a multimodal scene graph that fuses text and images. It produces layout, shape, and texture in three branches via a straight-line rectified flow, keeping style consistent across the whole scene. ❓ Challenges Solved Language-driven retrieval methods lack object-level control and style coherence, while graph-based methods struggle with high-quality textures. FlowScene resolves both weaknesses at once. 💡 Methodology & Proposed Approach ・It takes a multimodal graph where nodes fuse text descriptions and image features (text-only, image-only, or mixed) ・An InfoExchangeUnit densely exchanges node information during sampling to satisfy both individual and holistic conditions ・Layout (3D boxes), shape (VQ-VAE latents), and texture (anchored to geometry) are generated by independent denoisers ・Texture is denoised with geometry fixed, so even text-only nodes get style-consistent textures through information exchange 🎯 Use Cases It fits interactive scene design for interior design and manufacturing, VR/AR content creation, and building simulation environments for robotics. 📊 Experimental Results ・Bedroom FID improves from 42.38 to 35.01, 17.4% better than MMGDreamer ・CLIPScore of 0.2386 is the best of all methods, and users rate style consistency 8.72/10 ・Inference without textures takes 6.83s, about 85% faster than MMGDreamer's 45.34s ・Object quality also improves, e.g. a 43.90% better minimum matching distance on nightstands #3DGeneration# #GenerativeAI#