Excited to introduce StereoPolicy, led by
@EvansXuHan.
๐ท๐ท๐คStereoPolicy is an effective way to add geometric cues to modern robot policy models while keeping the strengths of pretrained 2D encoders.
โ๏ธWhy stereo for robot manipulation?
Monocular RGB often lacks the depth cues needed for precise manipulation, while RGB-D and point clouds can be noisy or brittle, especially on reflective and transparent objects in real-world deployment.
Instead of explicitly reconstructing disparity, depth, or point clouds, StereoPolicy directly fuses synchronized left/right RGB views to learn implicit stereo cues, avoiding extra reconstruction latency that can make real-time manipulation difficult.
Project Page: