2011 Fiscal Year Annual Research Report

非合理な選択行動の特性から学習原理を導く

Research Project

Project/Area Number	21700174
Research Institution	Tamagawa University
Principal Investigator	酒井裕玉川大学, 脳科学研究所, 准教授 (70323376)
Keywords	知能機械 / 神経科学 / 強化学習 / 動物行動 / 知識獲得
Research Abstract	動物は将来の大きな報酬より、日先の小さな報酬を好む傾向があることが知られている(衝動的選好)。客観的に明らかに損である場合にもこのような選好がみられる。衝動的選好は将来の報酬の主観的価値を割り引いている結果である、と一般に解釈されている。本研究計画では、この割引の程度(割引率)を客観的に測定する手法の開発を目指したが、研究を進めていく過程で、従来の強化学習理論で確立している割引価値の最大化問題が崩壊する場合があることを発見した。そこで、平成23年度は、まず従来の枠組の問題点を浮き彫りにして見直しを行い、新たな枠組を構築した。従来の強化学習理論では被験者が行動選択に用いる情報源は予め与えられるもので、被験者が自ら決めるものではない、と想定している。しかし、実際に行動選択する場面では、これまで得られているあらゆる情報の中から、行動選択に重要な情報だけを抜き出さなければ、行動選択を学習することはできない。割引価値最大化問題が崩壊する原因は、この点に由来することを突き止めた。そこで、被験者がどんな情報源を用いていたとしても定義できる割引価値を検討した。素朴な拡張をすると衝動的選好を再現できないため、主観的な時間の概念を拡張し、イベント発生と共に刻む間隔が可変の時間ステップで行動選択をするが、割引は外的な時間によって決まるような枠組を構築した。その結果、どんな情報源を用いていたとしても定義でき、衝動的選好を再現するような新たな割引価値最大化問題を構築できた。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 研究計画を進めている過程で、従来の強化学習理論の枠組の欠陥を発見し(平成22年度)、新しい枠組を構築した(平成23年度)点で、当初の計画以上に進展している部分がある一方、その新たな発見によって、無意味となってしまった計画(マッチング則を用いた割引率の推定)がある。計画通りに進まなかった故に、大きな進展を生んだ、という意味で全体的には、順調に成果が上がっていると考えられる。
Strategy for Future Research Activity	申請時点で気付いていなかった新たな発見をしたことにより、その発見を糸口にした枠組の整備を優先させている。今後もその方針で進めていく。

Research Products
(4 results)

All 2011

All Presentation (4 results)

[Presentation] 割引価値問題は被験者の戦略によって不良設定問題となる2011
- Author(s)
  酒井裕
- Organizer
  日本神経回路学会第21回全国大会
- Place of Presentation
  沖縄科学技術大学院大学(沖縄県)
- Year and Date
  2011-12-15
[Presentation] 非合理行動の背後にある合理的な学習戦略2011
- Author(s)
  酒井裕
- Organizer
  日本基礎心理学会第30回大会
- Place of Presentation
  慶義塾大学日吉キャンパス(神奈川県)(招待講演)
- Year and Date
  2011-12-04
[Presentation] The reward-maximization learning indifferent to historical state reproduces the preference reversal in intertemporal choice2011
- Author(s)
  Sakai Y
- Organizer
  The 34th annual meeting of the Japan neuroscience society
- Place of Presentation
  パシフィコ横浜(神奈川県)
- Year and Date
  2011-09-17
[Presentation] Discounted value problem becomes ill-posed by subject's strategy2011
- Author(s)
  Sakai Y
- Organizer
  The 8th IBRO World Congress of Neuroscience
- Place of Presentation
  Fortezza da Basso (Florence, Italy)
- Year and Date
  2011-07-17

2011 Fiscal Year Annual Research Report

非合理な選択行動の特性から学習原理を導く

Principal Investigator

酒井 裕 玉川大学, 脳科学研究所, 准教授 (70323376)

Current Status of Research Progress

Reason

Research Products

[Presentation] 割引価値問題は被験者の戦略によって不良設定問題となる2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 非合理行動の背後にある合理的な学習戦略2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] The reward-maximization learning indifferent to historical state reproduces the preference reversal in intertemporal choice2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Discounted value problem becomes ill-posed by subject's strategy2011

Author(s)

Organizer

Place of Presentation

Year and Date

酒井裕玉川大学, 脳科学研究所, 准教授 (70323376)