Project/Area Number |
17K12737
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | Tokyo University of Agriculture and Technology |
Principal Investigator |
Yano Shiro 東京農工大学, 工学(系)研究科(研究院), 助教 (90636789)
|
Project Period (FY) |
2017-04-01 – 2019-03-31
|
Project Status |
Completed (Fiscal Year 2018)
|
Budget Amount *help |
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2018: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2017: ¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
|
Keywords | 強化学習 / 鏡像降下法 / ベイズ推定 / Nesterov加速 / 直接方策探索 / Nesterov加速法 / 機械学習 |
Outline of Final Research Achievements |
In summary, this project tried the three issues. 1. To provide and extend a direct policy search method on the basis of the Mirror descent method. 2. To study the relationship between the mirror descent method and Bayes' theorem. 3. To apply the proposed reinforcement algorithms for the tasks including locomotion simulation, deep reinforcement learning tasks and robotic arm control. The project proposed "mirror descent search". Then, accelerated mirror descent method was applied onto the proposed one. The project studied the Bayesian inference algorithms from the viewpoint of mirror descent method. The project was evaluated by the tasks such that 1. Convolutional Neural network training (~5e8 dimensional problem) 2. Locomotion learning in the physics engine 3. Robotic arm control problem in the real world
|
Academic Significance and Societal Importance of the Research Achievements |
相手の価値観や競技の採点基準(目的関数)を満たすよう行動を最適化する必要があるとき,初対面の相手や初めての競技で,この目的関数を事前に把握することは困難である.本課題で扱うのは,こうした扱う問題のモデルを持たない状況で現場に臨み行動(方策関数)を最適化していく問題であり,未知環境下で活動する人工物にとって重要な問題である. より実用的には行動空間も状態空間も高次元かつ連続という状況を考える必要があり,本課題ではこうした高次元な強化学習問題のためのアルゴリズム設計と,いくつかの応用事例を示すものである.
|