Theoretical research of the policy gradient reinforcement learning without Markov properties and its application to games

Research Project

Project/Area Number	26330419
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Entertainment and game informatics 1
Research Institution	Shibaura Institute of Technology
Principal Investigator	Harukazu Igarashi 芝浦工業大学, 工学部, 教授 (80288886)
Co-Investigator(Renkei-kenkyūsha)	ISHIHARA Seiji 東京電機大学, 理工学部, 准教授 (50351656)
Research Collaborator	MORIOKA Yuichi YAMAMOTO Kazumasa
Project Period (FY)	2014-04-01 – 2017-03-31
Project Status	Completed (Fiscal Year 2016)
Budget Amount *help	¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000) Fiscal Year 2016: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2015: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2014: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Keywords	強化学習 / 方策勾配法 / マルチエージェント / コンピュータ将棋 / ロボカップ / ソフトマックス探索 / サッカー / マルチエージェントシステム / RoboCup / ファジィ推論
Outline of Final Research Achievements	In this research project, we have made theoretical and practical research for developing expressions of policy functions and learning methods in the policy gradient reinforcement learning algorithms. Our final goal is constructing a general methodology that can be applied to computer games and engineering fields.　The results of this project are as follows. (1)Theoretical research on the policy gradient reinforcement learning: we proposed new methods in ①hierarchical reinforcement learning to learn higher strategies of agents, ②learning with separated knowledge of environmental dynamics and action-values in agent policies, and ③learning with a fuzzy controller for policies. (2) Practical application of the policy gradient reinforcement learning: we applied the proposed learning methods to pursuit games, robot soccer games and computer shogi and examines the efficiency of our methods.

Report

(4 results)

2016 Annual Research Report Final Research Report ( PDF )
2015 Research-status Report
2014 Research-status Report

Research Products
(14 results)

All 2017 2016 2015 2014

All Journal Article (4 results) (of which Peer Reviewed: 2 results, Acknowledgement Compliant: 2 results, Open Access: 1 results) Presentation (10 results)

[Journal Article] Hierarchical Policy Gradient Reinforcement Learning: Two-layer Model2017
- Author(s)
  Harukazu Igarashi and Seiji Ishihara
- Journal Title
  
  The Research Reports of Shibaura Institute of Technology, Natural Sciences and Engineering
  
  Volume: 60 Pages: 21-28
- DOI
  10.13140/RG.2.2.19842.89285
- Related Report
  2016 Annual Research Report
[Journal Article] Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies2016
- Author(s)
  石原聖司，五十嵐治一
- Journal Title
  
  IEEJ Transactions on Electronics, Information and Systems
  
  Volume: 136 Issue: 3 Pages: 282-289
- DOI
  10.1541/ieejeiss.136.282
- NAID
  130005132276
- ISSN
  0385-4221, 1348-8155
- Related Report
  2015 Research-status Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Learning Positional Evaluation Functions without Using Databases of Game Records between Professional Shogi Players2016
- Author(s)
  Harukazu Igarashi, Yuichi Morioka, Kazumasa Yamamoto
- Journal Title
  
  The Research Reports of Shibaura Institute of Technology, Natural Sciences and Engineering
  
  Volume: 59 Pages: 39-47
- DOI
  10.13140/RG.2.1.4797.2242
- Related Report
  2015 Research-status Report
- Acknowledgement Compliant
[Journal Article] Policy Gradient Reinforcement Learning with a Fuzzy Controller for Policy: Decision Making in RoboCup Soccer Small Size League2014
- Author(s)
  杉本将也，五十嵐治一，石原聖司，田中一基
- Journal Title
  
  Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
  
  Volume: 26 Issue: 3 Pages: 647-657
- DOI
  10.3156/jsoft.26.647
- NAID
  130004491924
- ISSN
  1347-7986, 1881-7203
- Related Report
  2014 Research-status Report
- Peer Reviewed / Open Access
[Presentation] 局面評価関数を用いたサッカーエージェントの移動先決定2016
- Author(s)
  大内斉，五十嵐治一
- Organizer
  情報処理学会
- Place of Presentation
  箱根セミナーハウス（神奈川県足柄下郡箱根町仙石原845）
- Year and Date
  2016-11-04
- Related Report
  2016 Annual Research Report
[Presentation] ソフトマックス戦略と実現確率による深さ制御を用いたシンプルなゲーム木探索方式2016
- Author(s)
  原悠一，五十嵐治一，森岡祐一，山本一将
- Organizer
  情報処理学会
- Place of Presentation
  箱根セミナーハウス（神奈川県足柄下郡箱根町仙石原845）
- Year and Date
  2016-11-04
- Related Report
  2016 Annual Research Report
[Presentation] サッカーエージェントにおけるスルーパスの強化学習2016
- Author(s)
  田川諒，五十嵐治一
- Organizer
  電子情報通信学会ほか
- Place of Presentation
  富山大学(富山県富山市)
- Year and Date
  2016-09-07
- Related Report
  2016 Annual Research Report
[Presentation] サッカーエージェントにおける局面評価関数の強化学習2015
- Author(s)
  田川諒，五十嵐治一
- Organizer
  情報処理学会第20回ゲーム・プログラミング・ワークショップ
- Place of Presentation
  軽井沢
- Year and Date
  2015-11-06
- Related Report
  2015 Research-status Report
[Presentation] コンピュータ将棋における方策勾配を用いた局面評価関数の教師付学習2015
- Author(s)
  大串明，山本一将，森岡祐一，五十嵐治一
- Organizer
  情報処理学会第20回ゲーム・プログラミング・ワークショップ
- Place of Presentation
  軽井沢
- Year and Date
  2015-11-06
- Related Report
  2015 Research-status Report
[Presentation] プロ棋士の棋譜データベースを用いない局面評価関数の学習法についての考察2015
- Author(s)
  五十嵐治一，森岡祐一，山本一将
- Organizer
  情報処理学会第34回ゲーム情報学研究発表会
- Place of Presentation
  福岡
- Year and Date
  2015-07-04
- Related Report
  2015 Research-status Report
[Presentation] Policy Gradient Method Using Fuzzy Controller in Policies and Its Application2014
- Author(s)
  Noor Imanina N.H. , Harukazu Igarashi
- Organizer
  The International Conference on Artificial Intelligence and Pattern Recognition
- Place of Presentation
  Kuala Lumpur, Malaysia
- Year and Date
  2014-11-17 – 2014-11-19
- Related Report
  2014 Research-status Report
[Presentation] 方策勾配法による探索制御の一考察2014
- Author(s)
  五十嵐治一，森岡祐一，山本一将
- Organizer
  第19回ゲーム・プログラミング　ワークショップ2014
- Place of Presentation
  箱根，神奈川県
- Year and Date
  2014-11-07 – 2014-11-09
- Related Report
  2014 Research-status Report
[Presentation] agent2d のチェーンアクションにおける評価関数の重み調整2014
- Author(s)
  田川諒，谷川俊策，五十嵐治一
- Organizer
  第13回情報科学技術フォーラム(FIT2014)
- Place of Presentation
  筑波，茨城県
- Year and Date
  2014-09-03
- Related Report
  2014 Research-status Report
[Presentation] RoboCupサッカーシミュレーションリーグ2Dにおける局面評価関数の設計と学習2014
- Author(s)
  谷川俊策，五十嵐治一，石原聖司
- Organizer
  ロボティクス・メカトロニクス講演会2014
- Place of Presentation
  富山，富山県
- Year and Date
  2014-05-26
- Related Report
  2014 Research-status Report

Theoretical research of the policy gradient reinforcement learning without Markov properties and its application to games

Principal Investigator

Harukazu Igarashi 芝浦工業大学, 工学部, 教授 (80288886)

¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)

Report

Research Products

[Journal Article] Hierarchical Policy Gradient Reinforcement Learning: Two-layer Model2017

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies2016

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] Learning Positional Evaluation Functions without Using Databases of Game Records between Professional Shogi Players2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Policy Gradient Reinforcement Learning with a Fuzzy Controller for Policy: Decision Making in RoboCup Soccer Small Size League2014

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] 局面評価関数を用いたサッカーエージェントの移動先決定2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] ソフトマックス戦略と実現確率による深さ制御を用いたシンプルなゲーム木探索方式2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] サッカーエージェントにおけるスルーパスの強化学習2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] サッカーエージェントにおける局面評価関数の強化学習2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] コンピュータ将棋における方策勾配を用いた局面評価関数の教師付学習2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] プロ棋士の棋譜データベースを用いない局面評価関数の学習法についての考察2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Policy Gradient Method Using Fuzzy Controller in Policies and Its Application2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 方策勾配法による探索制御の一考察2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] agent2d のチェーンアクションにおける評価関数の重み調整2014

Author(s)