🏠 Describe a room in plain text, and out comes a complete, physics-ready scene a robot can actually interact with. That's SceneSmith, an ICML 2026 Spotlight from MIT and Toyota Research Institute.
Title: nepfaff/scenesmith (SceneSmith)
URL:
🏠 Overview
SceneSmith is an agentic system that generates simulation-ready indoor scenes from natural language. It produces furniture, wall-mounted mirrors and artwork, ceiling chandeliers, and small tabletop items — all with physical properties like mass and inertia — so the scenes can be used directly for robot training and evaluation.
❓ Challenges Solved
Building realistic indoor scenes for robot simulation has meant manual modeling or tedious scene composition, a major bottleneck for scaling robot evaluation and training. SceneSmith removes this by automatically generating diverse, contextually coherent scenes from text prompts.
💡 Methodology & Approach
Scene generation runs as a five-stage sequential pipeline.
・Floor plan generation (walls and floor layout)
・Large furniture placement
・Wall-mounted objects (mirrors, artwork, shelves, clocks)
・Ceiling fixtures (chandeliers, pendant lights, ceiling fans)
・Manipulable small objects
Checkpoints are saved automatically after each stage, so you can resume or branch midway. Scene reasoning and task decomposition use a VLM agent (GPT-5).
🎯 Use Cases & Tech
・3D assets are generated with the high-quality SAM3D (recommended) or Hunyuan3D-2, with retrieval from HSSD and Objaverse also supported
・AmbientCG PBR materials are applied via CLIP-based semantic search, and articulated objects from ArtVIP and PartNet-Mobility are handled with joint kinematics
・Output is native Drake format, with export to MuJoCo, USD, and Isaac Sim
📊 Highlights
・From a task like "find a fruit from the bowl and place it on a plate," it generates multiple constrained scene variations and supports robot evaluation
・A 151-word prompt yields a community center, even inferring context like placing ping pong paddles and balls near the table
・Geometry generation is distributed across GPUs, with bubblewrap isolation preventing rendering OOM
#
Robotics# #
SceneGeneration#