🏠 Just specify furniture with text or images, and get a style-consistent 3D indoor scene generated automatically, about 85% faster than MMGDreamer.
Title: FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow
URL:
📝 Overview
FlowScene generates high-fidelity 3D indoor scenes from a multimodal scene graph that fuses text and images. It produces layout, shape, and texture in three branches via a straight-line rectified flow, keeping style consistent across the whole scene.
❓ Challenges Solved
Language-driven retrieval methods lack object-level control and style coherence, while graph-based methods struggle with high-quality textures. FlowScene resolves both weaknesses at once.
💡 Methodology & Proposed Approach
・It takes a multimodal graph where nodes fuse text descriptions and image features (text-only, image-only, or mixed)
・An InfoExchangeUnit densely exchanges node information during sampling to satisfy both individual and holistic conditions
・Layout (3D boxes), shape (VQ-VAE latents), and texture (anchored to geometry) are generated by independent denoisers
・Texture is denoised with geometry fixed, so even text-only nodes get style-consistent textures through information exchange
🎯 Use Cases
It fits interactive scene design for interior design and manufacturing, VR/AR content creation, and building simulation environments for robotics.
📊 Experimental Results
・Bedroom FID improves from 42.38 to 35.01, 17.4% better than MMGDreamer
・CLIPScore of 0.2386 is the best of all methods, and users rate style consistency 8.72/10
・Inference without textures takes 6.83s, about 85% faster than MMGDreamer's 45.34s
・Object quality also improves, e.g. a 43.90% better minimum matching distance on nightstands
#
3DGeneration# #
GenerativeAI#