深局匷ååŠç¿ (DRL) ã¯ã人工ç¥èœ (AI) ãšæ©æ¢°åŠç¿ (ML) ã®é«åºŠãªãµããã£ãŒã«ãã§ãããæ·±å±€åŠç¿æè¡ãšåŒ·ååŠç¿ã¢ã«ãŽãªãºã ãçµã¿åãããŠãé·æç®æšãæé©åããããã«è©Šè¡é¯èª€ãéããŠæææ±ºå®ã§ããã€ã³ããªãžã§ã³ã ãšãŒãžã§ã³ããäœæããŸãããŸãã¯å ±é ¬ãããã«ããããšãŒãžã§ã³ãã¯è€éã§åçãã€äžç¢ºå®ãªç°å¢ãšã®çžäºäœçšããç¶ç¶çã«åŠç¿ããããšãã§ããŸãã DRL ã®äžæ žã¯ããã¥ãŒã©ã« ãããã¯ãŒã¯ã䜿çšããŠè€éãªé¢æ°ãè¿äŒŒããç°å¢ã®èгå¯ã«åºã¥ããŠã¢ã¯ã·ã§ã³ãç¶æ ã®å€ãå¹ççã«æšå®ããããšã«ãããŸãããããã®æ©èœã«ãããDRL ã¯ããããå·¥åŠãèªç¶èšèªåŠçãã¬ã³ã¡ã³ããŒã·ã§ã³ ã·ã¹ãã ãèªåé転è»ãã²ãŒã ãªã©ã®ããŸããŸãªã¢ããªã±ãŒã·ã§ã³ã§ç®èŠãŸãããã€ã«ã¹ããŒã³ãéæããããšãã§ããŸããã
DRL ã®äžå¿ãšãªã 2 ã€ã®äž»èŠãªæŠå¿µã¯ãç°å¢ãšã®çžäºäœçšãéããŠæé©ãªããªã·ãŒãåŠç¿ããããšã«éç¹ã眮ã匷ååŠç¿ãšã人工ãã¥ãŒã©ã« ãããã¯ãŒã¯ã䜿çšããŠããŒã¿å ã®è€éãªãã¿ãŒã³ãé¢ä¿ãäžè¬åããŠè¡šçŸããæ·±å±€åŠç¿ã§ãããããã®ææ³ãçµã¿åãããããšã§ãäž¡æ¹ã®æ©èœãçžä¹çã«æ¡åŒµãããŸããæ·±å±€åŠç¿ã¯ãå€§èŠæš¡ãªç¶æ 空éãšè€éãªé¢æ°ã«æ¡åŒµããã³äžè¬åããæ©èœããããããäžæ¹ã匷ååŠç¿ã¯ãæ¢çŽ¢ãšæŽ»çšã®ãã¬ãŒããªããéããŠåŠç¿ããã»ã¹ãã¬ã€ããããšãŒãžã§ã³ãã®æ¹åãå¯èœã«ããŸããæéã®çµéãšãšãã«äžè²«ããããã©ãŒãã³ã¹ãåŸãããŸãã
éåžžãDRL ãã¬ãŒã ã¯ãŒã¯ã«ã¯ãç°å¢ããšãŒãžã§ã³ããç¶æ ãã¢ã¯ã·ã§ã³ãå ±é ¬ãšãã£ãã³ã³ããŒãã³ããå«ãŸããŸããç°å¢ã¯ããšãŒãžã§ã³ããåäœããã³ã³ããã¹ãç°å¢ã衚ããŸãããšãŒãžã§ã³ã㯠AI äž»å°åã§ãããã¢ã¯ã·ã§ã³ãéããŠç°å¢ãšå¯Ÿè©±ãã芳å¯ãããç¶æ ã®å€åãšç¹å®ã®ã¢ã¯ã·ã§ã³ãå®è¡ããŠåãåãå ±é ¬ã«åºã¥ããŠãããé©åãªæææ±ºå®ãè¡ãæ¹æ³ãåŠç¿ããŸãããšãŒãžã§ã³ãã¯ãããè¯ãé·æçãªçµæãéæããããã«ãåã¢ã¯ã·ã§ã³ã®åœé¢ã®äŸ¡å€ãšå°æ¥ã®äŸ¡å€ã®äž¡æ¹ãèæ ®ããŠããšããœãŒããŸãã¯è€æ°ã®ã¿ã€ã ã¹ãããã«ããã环ç©å ±é ¬ (ãªã¿ãŒã³ãšãåŒã°ãã) ãæå€§åããæé©ãªããªã·ãŒãéçºããããšãç®æããŠããŸãã
ãããéæããããã«ãDRL æè¡ã§ã¯éåžžãå€ããŒã¹ã®æ¹æ³ãšããªã·ãŒããŒã¹ã®æ¹æ³ãçµã¿åãããŠäœ¿çšââããŸãã Q åŠç¿ãæéå·®ååŠç¿ãªã©ã®äŸ¡å€ããŒã¹ã®ææ³ã¯ãåç¶æ ãšè¡åã®ãã¢ã«é¢é£ä»ãããã䟡å€é¢æ°ãæšå®ããããšãç®çãšããŠããŸããå¯Ÿç §çã«ãPolicy Gradient ã Actor-Critic ãªã©ã®ããªã·ãŒããŒã¹ã®ææ³ã¯ãæåŸ åçã«é¢é£ããç®ç颿°ãæç€ºçã«æé©åããããšã§æé©ãªããªã·ãŒãåŠç¿ããããšããŸããã©ã¡ãã®ã¢ãããŒãã«ãç¬èªã®ã¡ãªãããšèª²é¡ããããæåãã DRL ã¢ããªã±ãŒã·ã§ã³ã§ã¯ããã€ããªããæè¡ãæ¡çšããŠå šäœçãªããã©ãŒãã³ã¹ãšå®å®æ§ãåäžãããããšããããããŸãã
DRL ãšãŒãžã§ã³ãã广çã«ãã¬ãŒãã³ã°ããã«ã¯ãå€ãã®å Žåãããã€ãã®èª²é¡ãå æããå¿ èŠããããŸããããšãã°ãæ¢çŽ¢ãšæŽ»çšã®ãã¬ãŒããªãã¯ãç°å¢ã«é¢ããæ°ããæ å ±ã®åéãšãå ±é ¬ãæé©åããããã®æ¢åã®ç¥èã®æŽ»çšãšã®éã®ãã©ã³ã¹ãç¶æããããã«éèŠãªåŽé¢ã§ããããã«ãå€§èŠæš¡ã§é«æ¬¡å ã®ç¶æ 空éã§ã®åŠç¿ãéšåçãªå¯èŠ³æž¬æ§ã®åŠçããã€ãºã®å€ãå ±é ¬ãé å»¶ããå ±é ¬ã®ç®¡çãåŠç¿ããç¥èãã¿ã¹ã¯éã§è»¢éããããšã¯ãå šäœçãªããã©ãŒãã³ã¹ãšå ç¢æ§ãåäžãããããã« DRL ã¢ã«ãŽãªãºã ãåãçµãå¿ èŠãããéèŠãªèª²é¡ã®äžéšã§ãã
ãããã®èª²é¡ã«å¯ŸåŠããããã«ãDeep Q-Networks (DQN)ãAsynchronous Advantage Actor-Critic (A3C)ãDeep Deterministic Policy Gradient (DDPG) ãªã©ã®ããŸããŸãª DRL ã¢ã«ãŽãªãºã ãææ¡ãããŠãããããŸããŸãªãã¡ã€ã³ã§ç®èŠãŸããæåãåããŠããŸããããšãã°ãDRL ã¯ãå€å žç㪠Atari ã²ãŒã ã§äººéã®çç·Žãã¬ã€ã€ãŒã«åå©ãããããã€ãŠäººéã®ç¥æ§ã®æ ç¹ãšèããããŠããå²ç¢ããã¹ã¿ãŒããããè€éãªãããã ã¿ã¹ã¯ã§é«åºŠãªæäœãå®è¡ãããããããã«äœ¿çšãããŠããŸããã DRL ã¯ãéèããã«ã¹ã±ã¢ããµãã©ã€ ãã§ãŒã³ã®æé©åãã³ã³ãã¥ãŒã¿ãŒ ããžã§ã³ãªã©ã®ããŸããŸãªåéã§ãå®çšåãããŠããŸãã
ããã¯ãšã³ããWebãããã³ã¢ãã€ã« ã¢ããªã±ãŒã·ã§ã³ãçæã§ãã匷åãªno-codeããŒã«ã§ããAppMasterãã©ãããã©ãŒã ã®ã³ã³ããã¹ãã§ã¯ãDRL ã䜿çšããŠãéçºããã³ã¢ããªã±ãŒã·ã§ã³ã®ã©ã€ããµã€ã¯ã«ã®ããŸããŸãªåŽé¢ãèªååããã³æé©åã§ããŸããããšãã°ãDRL ããŒã¹ã®ã¢ã«ãŽãªãºã ã䜿çšããŠããªãœãŒã¹å²ãåœãŠã®æé©åãè² è·åæ£ã®å®è¡ãããã«ã¯è€éãªã¢ããªã±ãŒã·ã§ã³ã®ãã¹ããšãããã°ã®ããã»ã¹ãèªååããããšãã§ããŸããããã«ãDRL ã¯ããŠãŒã¶ãŒã®è¡åã奜ã¿ã«åºã¥ããŠãŠãŒã¶ãŒ ãšã¯ã¹ããªãšã³ã¹ãããŒãœãã©ã€ãºããã³æé©åã§ãããé©å¿çã§åçãªãŠãŒã¶ãŒ ã€ã³ã¿ãŒãã§ã€ã¹ã®çæã«è²¢ç®ã§ããŸããããã«ããã AppMasterãã©ãããã©ãŒã äžã«æ§ç¯ãããã¢ããªã±ãŒã·ã§ã³ã®é¡§å®¢æºè¶³åºŠãç¶æçããšã³ã²ãŒãžã¡ã³ããå€§å¹ ã«åäžããŸãã
èŠçŽãããšã深局匷ååŠç¿ã¯ AI ãšæ©æ¢°åŠç¿ã®äžçã«ãããææãªéçã衚ããŠãããè€éã§åçãªç°å¢ã«ãããæææ±ºå®ããã»ã¹ãé©å¿ãåŠç¿ãæé©åããããã®é«åºŠãªæ©èœãæäŸããŸãã DRL æè¡ã¯æ¹åãšæçãç¶ããŠãããããŸããŸãªé åã§æ°ããªãã¬ãŒã¯ã¹ã«ãŒãéæããã ãã§ãªããæ¥çå šäœã§ã¢ããªã±ãŒã·ã§ã³éçºãšããžã¿ã«å€é©ã®æªæ¥ã圢äœãäžã§ãéèŠãªåœ¹å²ãæããããšãæåŸ ãããŠããŸãã