未知環境から仮説を構築・推論するフレキシブルな認知的強化学習アルゴリズム

Research Project

Project/Area Number	14J10453
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Soft computing
Research Institution	Tokyo Denki University
Principal Investigator	甲野佑東京電機大学, 大学院先端科学技術研究科, 特別研究員(DC2)
Project Period (FY)	2014-04-25 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥1,700,000 (Direct Cost: ¥1,700,000) Fiscal Year 2015: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2014: ¥900,000 (Direct Cost: ¥900,000)
Keywords	満足化 / 強化学習 / 意思決定 / 速さと正確さのトレードオフ / 対称性推論
Outline of Annual Research Achievements	本研究課題は人間のフレキシブルな意思決定に習うことで，学習のために膨大なサンプリング数（試行時間）を必要とする強化学習の問題に対処することを目的としていた．このような問題は試行時間が短く済む『速さ』と結果の『正確さ』の間にあるトレードオフに起因する．本研究では具体的に人間の因果関係の強さの推定と高い相関を持つ価値関数（LS）を用いることで，意思決定のトレードオフに対処しようとした．前年度の研究により， LS 価値関数は人間の満足化方策と関連が強いために，正確さには固執しないフレキシブルさを実現しているということがわかり，この観点から LS 価値関数を拡張した（LS-VR，LSX）．満足化は基準値という目的値を設定することによって，正確さへの固執を緩めることでトレードオフを回避している．また前述の LS の拡張価値関数では基準値を適切に設定することで，正確さへの固執，すなわち最適化を非常に素早くできることがわかった．今年度の成果の一つは以上の結果を英文ジャーナルとして投稿したことにある．また満足化の利点として非定常環境に対する対処が最適化よりも優れており，非定常環境に対処する既存のメタバンディットアルゴリズムより良い成績を有することを示した．前年度までの LS 価値関数は即時的で確率的な報酬の生起／不生起のみを扱う多本腕バンディット問題のみを扱っていた．本年度は長期的な試行錯誤を必要とする報酬や複雑な環境を想定したより一般的な強化学習へのLS価値関数の拡張を行った（RLLS 価値関数）．具体的には複雑な物理ダイナミクスを持つ運動制御課題（大車輪運動）に RLLS 価値関数を用いて行い，他のアルゴリズムが非常に狭い範囲でのパラメータでしか正しい運動制御を学習できないのに対して， RLLS 価値関数は広いパラメータの範囲で，かつ素早く学習できる事がわかった．
Research Progress Status	27年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	27年度が最終年度であるため、記入しない。

Report

(2 results)

2015 Annual Research Report
2014 Annual Research Report

Research Products
(19 results)

All 2016 2015 2014

All Journal Article (9 results) (of which Peer Reviewed: 2 results, Open Access: 2 results, Acknowledgement Compliant: 1 results) Presentation (10 results)

[Journal Article] 認知特性を実装した価値関数による非定常環境への適応2016
- Author(s)
  甲野佑, 高橋達二
- Journal Title
  
  情報処理学会第78回全国大会予稿集
  
  Volume: 1 Pages: 77-78
- NAID
  170000163318
- Related Report
  2015 Annual Research Report
[Journal Article] Robotic Action Acquisition with Cognitive Biases in Coarse-grained State Space2016
- Author(s)
  Daisuke Uragami, Yu Kohno, Tatsuji Takahashi
- Journal Title
  
  BioSystems
  
  Volume: 印刷中
- Related Report
  2015 Annual Research Report
- Peer Reviewed
[Journal Article] 満足化価値関数を用いて自律的に探索する強化学習手法2016
- Author(s)
  牛田有哉, 甲野佑, 高橋達二
- Journal Title
  
  情報処理学会第78回全国大会予稿集
  
  Volume: 1 Pages: 351-352
- NAID
  170000163449
- Related Report
  2015 Annual Research Report
[Journal Article] A cognitive satisficing strategy for bandit problems2015
- Author(s)
  Yu Kohno, Tatsuji Takahashi
- Journal Title
  
  International Journal of Parallel Emergent and Distributed Systems
  
  Volume: 1 Issue: 2 Pages: 1-11
- DOI
  10.1080/17445760.2015.1075531
- Related Report
  2015 Annual Research Report
- Peer Reviewed
[Journal Article] 満足化とその基準の動的な更新による強化学習の促進2015
- Author(s)
  甲野佑, 高橋達二
- Journal Title
  
  SAI 2015 (2015年度人工知能学会全国大会(第29回)) 予稿集
  
  Volume: 1 Pages: 1-4
- NAID
  130007423146
- Related Report
  2015 Annual Research Report
[Journal Article] 限定合理性に触発された強化学習法によるロボット運動学習2015
- Author(s)
  水戸亜友美, 牛田有哉, 朝倉勇護, 甲野佑, 横須賀聡, 浦上大輔, 高橋達二
- Journal Title
  
  JSAI 2015 (2015年度人工知能学会全国大会(第29回)) 予稿集
  
  Volume: 1 Pages: 1-4
- NAID
  130007424937
- Related Report
  2015 Annual Research Report
[Journal Article] 不確実性の下での満足化を通じた最適化2015
- Author(s)
  高橋達二, 大用庫智, 甲野佑, 横須賀聡
- Journal Title
  
  JSAI 2015 (2015年度人工知能学会全国大会(第29回)) 予稿集
  
  Volume: 1 Pages: 1-4
- NAID
  130007425178
- Related Report
  2015 Annual Research Report
[Journal Article] 柔軟な意思決定機能のための認知特性の応用と検証2014
- Author(s)
  甲野佑，高橋達二
- Journal Title
  
  人工知能学会全国大会論文集
  
  Volume: 人工知能学会全国大会論文集28 Pages: 1-4
- NAID
  130007423708
- Related Report
  2014 Annual Research Report
- Open Access
[Journal Article] 未知で不確実な環境に対する認知特性の意味と応用2014
- Author(s)
  甲野佑，高橋達二
- Journal Title
  
  JCSS Japanese Congnitive Science Society
  
  Volume: 31 Pages: 777-782
- NAID
  40020244734
- Related Report
  2014 Annual Research Report
- Open Access / Acknowledgement Compliant
[Presentation] 認知特性を実装した価値関数による非定常環境への適応2016
- Author(s)
  甲野佑, 高橋達二
- Organizer
  情報処理学会第78回全国大会
- Place of Presentation
  慶応義塾大学矢上キャンパス, 神奈川県, 横浜市
- Year and Date
  2016-03-12
- Related Report
  2015 Annual Research Report
[Presentation] 満足化価値関数を用いて自律的に探索する強化学習手法2016
- Author(s)
  牛田有哉, 甲野佑, 高橋達二
- Organizer
  情報処理学会第78回全国大会予稿集
- Place of Presentation
  慶応義塾大学　矢上キャンパス, 神奈川県, 横浜市
- Year and Date
  2016-03-11
- Related Report
  2015 Annual Research Report
[Presentation] 認知的満足化による強化学習アルゴリズム2016
- Author(s)
  甲野佑, 高橋達二
- Organizer
  第10回内部観測研究会
- Place of Presentation
  東北大学電気通信研究所, 宮城県, 仙台市
- Year and Date
  2016-02-27
- Related Report
  2015 Annual Research Report
[Presentation] 満足化とその基準の動的な更新による強化学習の促進2015
- Author(s)
  甲野佑, 高橋達二
- Organizer
  2015年度人工知能学会全国大会（第29回）
- Place of Presentation
  公立はこだて未来大学, 北海道, 函館
- Year and Date
  2015-05-13
- Related Report
  2015 Annual Research Report
[Presentation] 限定合理性に触発された強化学習法によるロボット運動学習2015
- Author(s)
  水戸亜友美, 牛田有哉, 朝倉勇護, 甲野佑, 横須賀聡, 浦上大輔, 高橋達二
- Organizer
  2015年度人工知能学会全国大会（第29回）
- Place of Presentation
  公立はこだて未来大学, 北海道, 函館
- Year and Date
  2015-05-13
- Related Report
  2015 Annual Research Report
[Presentation] 不確実性の下での満足化を通じた最適化2015
- Author(s)
  高橋達二, 大用庫智, 甲野佑, 横須賀聡
- Organizer
  2015年度人工知能学会全国大会（第29回）
- Place of Presentation
  公立はこだて未来大学, 北海道, 函館
- Year and Date
  2015-05-13
- Related Report
  2015 Annual Research Report
[Presentation] 満足化方策とオンラインな均衡2015
- Author(s)
  甲野佑
- Organizer
  第9回内部観測研究会
- Place of Presentation
  早稲田大学西早稲田キャンパス，東京都，新宿区
- Year and Date
  2015-02-28
- Related Report
  2014 Annual Research Report
[Presentation] A Satisficing Strategy with Variable Reference in the Multi-armed Bandit Problems2014
- Author(s)
  Yu Kohno, Tatsuji Takahashi
- Organizer
  ICNAAM 2014-ABBII
- Place of Presentation
  Rodos Palace Hotel, Rhodes, Greece
- Year and Date
  2014-09-27
- Related Report
  2014 Annual Research Report
[Presentation] 未知で不確実な環境に対する認知特性の意味と応用2014
- Author(s)
  甲野佑，高橋達二
- Organizer
  認知科学会第31回大会
- Place of Presentation
  名古屋大学東山キャンパス，愛知県，名古屋市，千種区
- Year and Date
  2014-09-20
- Related Report
  2014 Annual Research Report
[Presentation] 柔軟な意思決定機能のための認知特性の応用と検証2014
- Author(s)
  甲野佑，高橋達二
- Organizer
  人工知能学会全国大会
- Place of Presentation
  愛媛県県民文化会館(ひめぎんホール)，愛媛県，松山市
- Year and Date
  2014-05-13
- Related Report
  2014 Annual Research Report

未知環境から仮説を構築・推論するフレキシブルな認知的強化学習アルゴリズム

Principal Investigator

甲野 佑 東京電機大学, 大学院先端科学技術研究科, 特別研究員(DC2)

¥1,700,000 (Direct Cost: ¥1,700,000)

Report

Research Products

[Journal Article] 認知特性を実装した価値関数による非定常環境への適応2016

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Robotic Action Acquisition with Cognitive Biases in Coarse-grained State Space2016

Author(s)

Journal Title

Related Report

[Journal Article] 満足化価値関数を用いて自律的に探索する強化学習手法2016

Author(s)

Journal Title

NAID

Related Report

[Journal Article] A cognitive satisficing strategy for bandit problems2015

Author(s)

Journal Title

DOI

Related Report

[Journal Article] 満足化とその基準の動的な更新による強化学習の促進2015

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 限定合理性に触発された強化学習法によるロボット運動学習2015

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 不確実性の下での満足化を通じた最適化2015

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 柔軟な意思決定機能のための認知特性の応用と検証2014

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 未知で不確実な環境に対する認知特性の意味と応用2014

Author(s)

Journal Title

NAID

Related Report

[Presentation] 認知特性を実装した価値関数による非定常環境への適応2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 満足化価値関数を用いて自律的に探索する強化学習手法2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 認知的満足化による強化学習アルゴリズム2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 満足化とその基準の動的な更新による強化学習の促進2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 限定合理性に触発された強化学習法によるロボット運動学習2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

甲野佑東京電機大学, 大学院先端科学技術研究科, 特別研究員(DC2)