2016 Fiscal Year Annual Research Report

Elucidation of the Mathematical Basis and Neural Mechanisms of Multi-layer Representation Learning

Planned Research

Project Area	Correspondence and Fusion of Artificial Intelligence and Brain Science
Project/Area Number	16H06563
Research Institution	Okinawa Institute of Science and Technology Graduate University
Principal Investigator	銅谷賢治沖縄科学技術大学院大学, 神経計算ユニット, 教授 (80188846)
Project Period (FY)	2016-06-30 – 2021-03-31
Keywords	ディープラーニング / 強化学習 / モジュール自己組織化
Outline of Annual Research Achievements	１）多階層表現学習の数理基盤：ディープラーニングを強化学習に用いる従来手法のDeep Q-Network (DQN)では、学習の安定性を保証するために将来報酬の予測を行うネットワークを一定期間学習させずに固定するという、データ効率を犠牲にした手法が用いられていた。その改善に向けて、DQNアルゴリズムを一般化したApproximate Value Iterationの枠組みでの収束速度の数理解析を行い、それをもとに将来報酬の予測を行うネットワークをより早く更新できる新たなアルゴリズムの導出を行なった。多数のゲーム課題を用いたシミュレーション実験により、多くの課題でデータ効率が改善されることを確認した。２）多階層表現学習の神経機構：大脳基底核での情報表現の獲得機構を明らかにするため、線条体の異なるコンパートメントの細胞を区別した新たな光学神経活動計測実験を行った。その結果、行動学習の初期と後期の異なるフェーズで報酬予測に関わるニューロン群が存在することを新たに発見した。さらに、大脳皮質での予測的な情報表現の計算機構を明らかにするために、マウスにレバーの微小な動きを識別させ操作を行わせる新たな行動パラダイムを開発し、そのための実験装置の作製と制御ソフトウェアの開発を行なった。また、大脳皮質の異なる層の神経活動を同時計測するために、内視鏡とプリズムを用いた新たな光学神経活動計測システムの立ち上げを行なった。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason １）多階層表現学習の数理基盤：ディープラーニングを強化学習に用いる場合に、学習の安定性を保ちつつデータ効率を上げるための新たなアルゴリズムを導出し、シミュレーション実験によりその性能改善を確認した。２）多階層表現学習の神経機構：光学神経活動計測とデータ解析により、大脳基底核の線条体にはは行動学習の異なるフェーズで報酬予測に関わるニューロン群が存在するという新たな知見を得ることができた。
Strategy for Future Research Activity	１）多階層表現学習の数理基盤：ディープラーニングによる強化学習のデータ効率をさらに改善するため、Approximate Value Iterationの数理解析をもとに、さらに効率の良いアルゴリズムの開発を進める。２）多階層表現学習の神経機構：大脳皮質での予測的な情報表現の光学神経活動計測実験を進め、一次感覚野と一次運動野の各層の動的な情報表現の違いを探索する。

Research Products
(4 results)

All 2017 Other

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (2 results) (of which Int'l Joint Research: 2 results, Invited: 1 results) Remarks (1 results)

[Journal Article] Adaptive Baseline Enhances EM-based Policy Search: Validation in a View-based Positioning Task of a Smartphone Balancer2017
- Author(s)
  Jiexin Wang, Eiji Uchibe, Kenji Doya
- Journal Title
  
  Frontiers in Neurorobotics
  
  Volume: 11 Pages: 1-15
- DOI
  10.3389/fnbot.2017.00001
- Peer Reviewed / Open Access
[Presentation] Fast Adaptation of Behavior to Changing Goals with a Gamma Ensemble2017
- Author(s)
  Chris Reinke, Eiji Uchibe, Kenji Doya
- Organizer
  The 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM2017)
- Place of Presentation
  The University of Michigan, Ann Arbor, Michigan, U.S.A.
- Year and Date
  2017-06-11 – 2017-06-14
- Int'l Joint Research
[Presentation] Coding of action and state values in the striatal compartments2017
- Author(s)
  Kenji Doya
- Organizer
  12th International Basal Ganglia Society Meeting (IBAGS-XII 2017)
- Place of Presentation
  Merida, Yucatan, Mexico
- Year and Date
  2017-03-26 – 2017-03-30
- Int'l Joint Research / Invited
[Remarks] 沖縄科学技術大学院大学　神経計算ユニット
- URL
  https://groups.oist.jp/ncu

2016 Fiscal Year Annual Research Report

Elucidation of the Mathematical Basis and Neural Mechanisms of Multi-layer Representation Learning

Principal Investigator

銅谷 賢治 沖縄科学技術大学院大学, 神経計算ユニット, 教授 (80188846)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Adaptive Baseline Enhances EM-based Policy Search: Validation in a View-based Positioning Task of a Smartphone Balancer2017

Author(s)

Journal Title

DOI

[Presentation] Fast Adaptation of Behavior to Changing Goals with a Gamma Ensemble2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Coding of action and state values in the striatal compartments2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Remarks] 沖縄科学技術大学院大学 神経計算ユニット

URL

銅谷賢治沖縄科学技術大学院大学, 神経計算ユニット, 教授 (80188846)

[Remarks] 沖縄科学技術大学院大学　神経計算ユニット