AIã«ç©ºéããèšèã§èãããããã®ã¯é广ãããããŸããð§ èŠããªãèŠç¹ãé ã®äžã§âæ³åâãããæ°ããã¢ãããŒãã®ç»å Žã§ãã
ã¿ã€ãã«: Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models
URL:
ð§ æŠèŠ
èŠèŠèšèªã¢ãã«ïŒVLMïŒã®ç©ºéæšè«ã匷åãããImaginative Perception TokensïŒIPTãæ³åçç¥èŠããŒã¯ã³ïŒãã®ææ¡ã§ãã空éçãªè«çãèšèªã«æŒã蟌ããã®ã§ã¯ãªãããç°ãªãé
眮ã§äœãèŠãããããšããç¥èŠãäžé衚çŸãšããŠä¿æããŸãã
â 解決ãã課é¡
VLMã¯ãèŠããŠããªãèŠç¹ããã®èŠãæ¹ãé®èœãããçµè·¯ãè€æ°ã®éšå芳枬ã®çµ±åãšãã£ãç©ºéæšè«ãèŠæã§ããåŸæ¥ã¯ããã¹ãã®æèé£éã§è§£ãããšããŸããããèŠèŠçãªæšè«ãèšèªã ãã«æŒã蟌ããã®ã¯ç¡çããããæ§èœãé æã¡ã§ããã
ð¡ æ¹æ³è«ãšææ¡ææ³
ã»çµ±ååVLMã®BAGELãããã¯ããŒã³ã«ãIPTã«ããæåž«ããåŠç¿ã§èšç·ŽããŸã
ã»3ã€ã®ã¿ã¹ã¯ãå®åŒåïŒèŠç¹ååŸïŒPETïŒãçµè·¯è¿œè·¡ïŒPTïŒãå€èŠç¹ã«ãŠã³ãïŒMVCïŒ
ã»çŽ20,000äŸã®ããŒã¿ã»ãããæ§ç¯ïŒæ£è§£ã»åçã»è©äŸ¡ææšã€ãïŒ
ãããããåãããããèŠããããšããç¥èŠãã®ãã®ãäžé衚çŸãšããŠæ±ãã®ãæ žå¿ã§ãã
ð å®éšçµæ
ã»å€èŠç¹ã«ãŠã³ãïŒMVCïŒã§IPTå©çšã«ãã粟床ã3.4%åäž
ã»çµè·¯è¿œè·¡ïŒPTïŒã§ã¯ããŒãºããœãŒã¹ã¢ãã«ãšç«¶äºåããæ§èœ
ã»IPTæåž«ããåŠç¿ã¯ããã¹ãæèé£éïŒCoTïŒãäžåã
ã»éã«ããã¹ãCoTã¯ç©ºéæšè«æ§èœã倧ããå£åããããšå€æ
#
ç©ºéæšè«# #
ãã«ãã¢ãŒãã«LLM#