Jihan Yang(@jihanyang13):Camera pose matters for video understanding! Today's MLLMs excel at recognizing activities, but still struggle with the underlying space and ego/object dynamics in video. We trace this gap to a missing piece: camera pose. Introducing Cambrian-P: a multimodal LLM natively grounded in camera pose. (1/n)

Jihan Yang

@jihanyang13

@amilabs; Prev. @NYU_Courant @HKUniversity; Researcher in Deep Learning, Computer Vision.

加入 November 2018

506 正在关注 1.2K 粉丝

Jihan Yang@jihanyang13

2026.05.26 23:14

Camera pose matters for video understanding! Today's MLLMs excel at recognizing activities, but still struggle with the underlying space and ego/object dynamics in video. We trace this gap to a missing piece: camera pose. Introducing Cambrian-P: a multimodal LLM natively grounded in camera pose. (1/n)

显示更多