Developing a theory of deep reinforcement learning equipped with bounded rationality
Project/Area Number |
17H04696
|
Research Category |
Grant-in-Aid for Young Scientists (A)
|
Allocation Type | Single-year Grants |
Research Field |
Soft computing
|
Research Institution | Tokyo Denki University |
Principal Investigator |
|
Project Period (FY) |
2017-04-01 – 2020-03-31
|
Project Status |
Completed (Fiscal Year 2019)
|
Budget Amount *help |
¥23,660,000 (Direct Cost: ¥18,200,000、Indirect Cost: ¥5,460,000)
Fiscal Year 2019: ¥6,630,000 (Direct Cost: ¥5,100,000、Indirect Cost: ¥1,530,000)
Fiscal Year 2018: ¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)
Fiscal Year 2017: ¥10,660,000 (Direct Cost: ¥8,200,000、Indirect Cost: ¥2,460,000)
|
Keywords | 限定合理性 / 強化学習 / 満足化 / 社会学習 / 弱教示的学習 / 判定問題 / 仮説検証 / 試行錯誤 / 動機付け / 教示的フィードバック / 評価的フィードバック / 対抗模倣 / 競争 / 満足化原理 / 半教示的フィードバック / メタ情報 / 模倣学習 / エミュレーション / 教示フィードバック / 評価フィードバック / 社会的満足化 / プロスペクト理論 / 社会的学習 / 意志決定 / 因果推論 / 機械学習 |
Outline of Final Research Achievements |
The real world agents such as human beings or animals learn and act in some (bounded) rational way toward their respective goals. The learning and acting are under severe restrictions as for perception, information processing, and actuation. In this project, we hypothesize that the efficient learning and acting are enabled by exploiting the search and decision-making policy called "satisficing" that was proposed as an alternative of optimization. We gave a new implementation (RS) of satisficing, establishing it as a useful algorithm, and we proved its efficiency. We applied RS to various tasks in reinforcement learning and showed its efficiency, including the most basic bandit problems and general tabular and non-tabular MDPs.
|
Academic Significance and Societal Importance of the Research Achievements |
人間や動物の扱う、試行錯誤を伴う自律的な学習のロジックの重要な一端を明らかにした。特に、なぜ人間や動物が競争と「対抗模倣」により効率的なパフォーマンスの向上を見せるのかについて機械論的な説明を与えた。さらに、数学的に効率性を証明するとともに、様々な状況で効率性を示した。また、資本主義や市場の観点から、競争や対抗模倣の効率性と、表裏一体であるその危険性についても論じた。
|
Report
(4 results)
Research Products
(16 results)