Ruohan Zhang(@RuohanZhang76):Excited to introduce StereoPolicy, led by @EvansXuHan. 📷📷🤖StereoPolicy is an effective way to add geometric cues to modern robot policy models while keeping the strengths of pretrained 2D encoders. ⁉️Why stereo for robot manipulation? Monocular RGB often lacks the depth cues needed for precise manipulation, while RGB-D and point clouds can be noisy or brittle, especially on reflective and transparent objects in real-world deployment. Instead of explicitly reconstructing disparity, depth, or point clouds, StereoPolicy directly fuses synchronized left/right RGB views to learn implicit stereo cues, avoiding extra reconstruction latency that can make real-time manipulation difficult. Project Page:

2026.06.03 20:58

Excited to introduce StereoPolicy, led by @EvansXuHan. 📷📷🤖StereoPolicy is an effective way to add geometric cues to modern robot policy models while keeping the strengths of pretrained 2D encoders. ⁉️Why stereo for robot manipulation? Monocular RGB often lacks the depth cues needed for precise manipulation, while RGB-D and point clouds can be noisy or brittle, especially on reflective and transparent objects in real-world deployment. Instead of explicitly reconstructing disparity, depth, or point clouds, StereoPolicy directly fuses synchronized left/right RGB views to learn implicit stereo cues, avoiding extra reconstruction latency that can make real-time manipulation difficult. Project Page: