Camera pose matters for video understanding!
Today's MLLMs excel at recognizing activities, but still struggle with the underlying space and ego/object dynamics in video. We trace this gap to a missing piece: camera pose.
Introducing Cambrian-P: a multimodal LLM natively grounded in camera pose. (1/n)
Cameras rolling, the countdown to our meeting has begun⏰—every bit of anticipation and affection shines bright.
This is a date where love meets love. Are you ready to meet Boss @layzhang
#LAY# #LayZhang#