ð¬ èžçãããèªå·±ååž°ã®åç»ã¢ãã«ã¯éãäžæ¹ã§ã人éã®å¥œã¿ãããºã¬ãã¡ã§ããåèžçãéããã»ã¹ã®å±éã䜿ããããé ããã»ã¹ãã§åŒ·ååŠç¿ã¢ã©ã€ã³ã¡ã³ããè¡ãAstrolabeãããã®é£é¡ã«çããŸãã
ã¿ã€ãã«: Astrolabe: Steering Forward-Process Reinforcement Learning
URL:
ð æŠèŠ
Astrolabeã¯ãèžçãããèªå·±ååž°ïŒARïŒåç»ã¢ãã«ã人éã®èŠèŠçãªå¥œã¿ã«æŽåããã匷ååŠç¿ãã¬ãŒã ã¯ãŒã¯ã§ããæå€§ã®ç¹åŸŽã¯ãåŸæ¥ã®éããã»ã¹æé©åã§ã¯ãªããé ããã»ã¹ïŒforward-processïŒã§RLãè¡ãç¹ã«ãããŸããå
š53ããŒãžã»37å³ã®å€§èŠæš¡ãªç ç©¶ã§ãã
â 解決ãã課é¡
èžçARåç»ã¢ãã«ã¯å¹ççãªã¹ããªãŒãã³ã°çæã«åãäžæ¹ã人éã®å¥œã¿ãšä¹é¢ãããããšãã匱ç¹ããããŸããããã«æ¢åã®RLã¯ãããããã¢ãŒããã¯ãã£ã«èªç¶ã«ã¯åããŸãããäžè¬ã«ãé«ã³ã¹ããªåèžçãããœã«ããŒçµåã®éããã»ã¹æé©åã®ãããããå¿
èŠãšããã©ã¡ããéãã¹ã±ãŒã«ãã«ãããã®ã§ããã
ð¡ æ¹æ³è«ãšææ¡ææ³
3ã€ã®å·¥å€«ããæããŸãã
ã»è² äŸèªèã®åŸ®èª¿æŽïŒæšè«ã®çµç«¯ã§æ£äŸãšè² äŸã察æ¯ãããéããã»ã¹ãå±éããã«ãæé»çãªããªã·ãŒæ¹åã®æ¹åã確ç«ããŸã
ã»ã¹ããªãŒãã³ã°åŠç¿ïŒããŒãªã³ã°KVãã£ãã·ã¥ã§ã·ãŒã±ã³ã¹ã段éçã«çæããRLæŽæ°ã¯å±æçãªã¯ãªãããŠã£ã³ããŠã«ã®ã¿é©çšãé·è·é¢ã®äžè²«æ§ã¯å
è¡ã³ã³ããã¹ããžã®æ¡ä»¶ä»ãã§ç¶æããŸã
ã»è€æ°å ±é
¬ã®ç®ç颿°ïŒäžç¢ºå®æ§ãèæ
®ããéžæçæ£ååãšåçãªåç
§æŽæ°ãçµ±åããå ±é
¬ãããã³ã°ïŒèŠããã®ã¹ã³ã¢ã ãäžãã厩å£ïŒãç·©åããŸã
ð¯ ãŠãŒã¹ã±ãŒã¹
ãªã¢ã«ã¿ã€ã ã»ã¹ããªãŒãã³ã°ãªåç»çæã§ãå¹ççãªèžçã¢ãã«ãéããä¿ã£ããŸãŸå¥œã¿ãžæŽåããããå Žé¢ã«åããŸããè€æ°ã®èžçARåç»ã¢ãã«ã«é©çšã§ããæšè«ã®è»œããç ç²ã«ããã«å質ãåºäžãã§ããŸãã
ð æçŸ©ãšçµæ
ã»åèžçãéããã»ã¹å±éãšããéãçµè·¯ãé¿ããããšã§ãèšç®å¹çã®ããã«ããã¯ã«å¯ŸåŠããŸã
ã»é ããã»ã¹ã§ã®è² äŸèªèã»ã¹ããªãŒãã³ã°æŽæ°ã»å ±é
¬ãããã³ã°å¯Ÿçãçµã¿åãããå
ç¢ã§ã¹ã±ãŒã©ãã«ãªã¢ã©ã€ã³ã¡ã³ãè§£ãæäŸããŸã
ã»è€æ°ã®èžçARã¢ãã«ã«ããã£ãп广§ã瀺ããã詳现ãªå®éè©äŸ¡ãšã¢ãã¬ãŒã·ã§ã³ãå«ã¿ãŸã
#
VideoGeneration# #
ReinforcementLearning#