階層形強化学習機構の自己組織化に関する研究

研究課題

研究課題/領域番号	13650480
研究種目	基盤研究(C)
配分区分	補助金
応募区分	一般
研究分野	制御工学
研究機関	東北大学
研究代表者	阿部健一東北大学, 大学院・工学研究科, 教授 (70005403)
研究分担者	田中明東北大学, 大学院・工学研究科, 助手 (10323057)
研究期間 (年度)	2001 – 2002
研究課題ステータス	完了 (2002年度)
配分額 *注記	3,400千円 (直接経費: 3,400千円) 2002年度: 800千円 (直接経費: 800千円) 2001年度: 2,600千円 (直接経費: 2,600千円)
キーワード	強化学習 / 部分観測マルコフ環境 / Q学習 / 階層型Q学習 / 学習オートマトン / スッチングQ学習 / ラベリングQ学習 / ニューラルネットワーク / 段階型Q学習 / リカレントニューラルネットワーク
研究概要	さきに、ラベリングQ(LQ)学習およびスイッチングQ(SQ)学習の2つのアルゴリズムを提案した。前者は単一のエージェントからなる簡単な構造のアルゴリズムであるが、ある種のPOMDP環境でうまく学習を行うことができる。また、後者は階層型強化学習法(HQ学習)の一種で、多数のQモジュールを階層型学習オートマトンによって切り替えるもので、やや複雑なPOMDP環境に適用可能である。本研究では、この2つの学習アルゴリズムの改良を図るとともに、より複雑なPOMDP環境に適用できるHQ学習の開発を行った。また、これらのアルゴリズムを観測および行動のそれぞれが連続値を取るような、より実際的な問題に適用するために、ニューラルネットワーク(RNN)による関数近似ついても基礎的な考察を準めた。本研究の成果は下記の通りである。 1)ノイズ環境でもその学習性能が保証できるようSQ学習の改良を図った.WieringらによるHQ学習とシミュレーション実験よる比較実験では、本アルゴリズムがより良好な学習性能を持つことを確認した。 2)LQ学習に自己組織化マップ(SOM)を導入し,LQ学習性能の一層の向上を図った。 3)SunらのSSS法の改良を図り、修正SSS法と呼ぶアルゴリズムと適格度トレースの考えを導入したSSS(λ)とを開発した。 4)SSS(λ)を移動ロボットにおけるナビゲーションタスクに応用し、本アルゴリズムの有効性を確認した。ここで、ロボットの外界センサから得られる多次元データをSOMなどの自己組織化アルゴリズムにより自動分類し、それをSSS(λ)の観測値とする方法を新たに考案した。 5)SRNと呼ぶRNNに対し、統計的近似学習法(SAL)と呼ぶ新たな学習法を提案した。SALによって、従来の方法ではその学習がうまく行えない非線形性の強い関数が精度よく近似できることをシミュレーションにより確認した。また,追加学習についても新たなアルゴリズムを提案した。

報告書

(3件)

2002 実績報告書研究成果報告書概要
2001 実績報告書

研究成果
(42件)

すべてその他

すべて文献書誌 (42件)

[文献書誌] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing. 55-62 (2001)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc.of the International Conference on Control, Automation and Systems. 5-8 (2001)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc.of the International Conference on Control, Automation and Systems. 292-295 (2001)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc.of 7^<th> Int.Symp.on Artificial Life and Robotics(AROB7^<th>). Vol.1. 16-18 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc.of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会諭文誌C. Vol.122-C, No.7. 1186-1193 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS.on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2003)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H.Y.Lee: "Performance of LQ-learning in POMDP Environments"Proc.of SICE Annual Conference 2002. 922-925 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M.Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc.of SICE Annual Conference 2002. 2913-2918 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] N.Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc.of SICE Annual Conference 2002. 2903-2908 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H.Y.Lee: "Labeling Q-learning with SOM"Int.Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H.Y.Lee: "LQ-learning with self-organizing map for POMDP environments"Proc.of 8^<th> Int.Symp.on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/ Distributed Computing. 55-62 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Y. Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M. Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M. Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of 7th Int. Symp. on Artificial Life and Robotics(AROB7th ). 1. 16-18 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M. Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. 4, No. 2. 124-129 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M. Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15th IFAC World Congress on Automatic Control. 2491-2496 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Kamaya: "Hierarchical Reinforcement Learning in Partially Observable Markovian Environments-A Proposal of Switching Q-learning"Trans. IEE of Japan. 122-C, No.7. 1186-1193 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Y. Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. E85-D, No. 9. 1425-1432 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Y. Lee: "Performance of LQ-learning in POMDP Environments"Proc. of SICE Annual Conference 2002. 922-925 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M. Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc. of SICE Annual Conference 2002. 2913-2918 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] N. Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc. of SICE Annual Conference 2002. 2903-2908 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Y. Lee: "Labeling Q-learning with SOM"Int. Conf. on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] H. Y. Lee: "LQ-learning with self-organizing map for POMDP environments"Proc. of 8th Int. Symp. on Artificial Life and Robotics(AR0B8th ). 1. 345-348 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会論文誌C. Vol.122-C, No.7. 1186-1193 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4^<th> Asian Control Conference (ASCC 2002). 408-413 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] H.Y.Lee: "Labeling Q-learning with SOM"Int. Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] H.Y.Lee: "Labeling Q-learning with self-organizing map for POMDP environments"Proc. of 8^<th> Int. Symp. on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] N.Honnma: "Stochastic Analysis of Chaos Dynamic in Recurrent Neural Networks"Pro. of IFSA/NAFIPS 2001. 298-303 (2001)
- 関連する報告書
  2001 実績報告書
[文献書誌] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel / Distributed Computing. 55-62 (2001)
- 関連する報告書
  2001 実績報告書
[文献書誌] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)
- 関連する報告書
  2001 実績報告書
[文献書誌] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)
- 関連する報告書
  2001 実績報告書
[文献書誌] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of AROB 7^<th> 2002. Vol.1. 16-18 (2002)
- 関連する報告書
  2001 実績報告書
[文献書誌] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"Trans. on Control, Automation and Systems Engineering (ICASE). (In press). (2002)
- 関連する報告書
  2001 実績報告書
[文献書誌] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会論文誌C. (印刷中). (2002)
- 関連する報告書
  2001 実績報告書

階層形強化学習機構の自己組織化に関する研究

研究代表者

阿部 健一 東北大学, 大学院・工学研究科, 教授 (70005403)

3,400千円 (直接経費: 3,400千円)

報告書

研究成果

[文献書誌] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing. 55-62 (2001)

説明

関連する報告書

[文献書誌] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc.of the International Conference on Control, Automation and Systems. 5-8 (2001)

説明

関連する報告書

[文献書誌] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc.of the International Conference on Control, Automation and Systems. 292-295 (2001)

説明

関連する報告書

[文献書誌] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc.of 7^<th> Int.Symp.on Artificial Life and Robotics(AROB7^<th>). Vol.1. 16-18 (2002)

説明

関連する報告書

[文献書誌] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)

説明

関連する報告書

[文献書誌] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc.of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)

説明

関連する報告書

[文献書誌] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会諭文誌C. Vol.122-C, No.7. 1186-1193 (2002)

説明

関連する報告書

[文献書誌] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS.on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2003)

説明

関連する報告書

[文献書誌] H.Y.Lee: "Performance of LQ-learning in POMDP Environments"Proc.of SICE Annual Conference 2002. 922-925 (2002)

説明

関連する報告書

[文献書誌] M.Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc.of SICE Annual Conference 2002. 2913-2918 (2002)

説明

関連する報告書

[文献書誌] N.Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc.of SICE Annual Conference 2002. 2903-2908 (2002)

説明

関連する報告書

[文献書誌] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)

説明

関連する報告書

[文献書誌] H.Y.Lee: "Labeling Q-learning with SOM"Int.Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)

説明

関連する報告書

[文献書誌] H.Y.Lee: "LQ-learning with self-organizing map for POMDP environments"Proc.of 8^<th> Int.Symp.on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)

説明

関連する報告書

[文献書誌] H. Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/ Distributed Computing. 55-62 (2001)

説明

関連する報告書

[文献書誌] H. Y. Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)

説明

関連する報告書

[文献書誌] M. Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)

説明

関連する報告書

[文献書誌] M. Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of 7th Int. Symp. on Artificial Life and Robotics(AROB7th ). 1. 16-18 (2002)

説明

関連する報告書

[文献書誌] M. Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. 4, No. 2. 124-129 (2002)

説明

関連する報告書

[文献書誌] M. Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15th IFAC World Congress on Automatic Control. 2491-2496 (2002)

説明

関連する報告書

[文献書誌] H. Kamaya: "Hierarchical Reinforcement Learning in Partially Observable Markovian Environments-A Proposal of Switching Q-learning"Trans. IEE of Japan. 122-C, No.7. 1186-1193 (2002)

説明

関連する報告書

[文献書誌] H. Y. Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. E85-D, No. 9. 1425-1432 (2002)

説明

関連する報告書

[文献書誌] H. Y. Lee: "Performance of LQ-learning in POMDP Environments"Proc. of SICE Annual Conference 2002. 922-925 (2002)

説明

関連する報告書

[文献書誌] M. Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc. of SICE Annual Conference 2002. 2913-2918 (2002)

説明

関連する報告書

[文献書誌] N. Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc. of SICE Annual Conference 2002. 2903-2908 (2002)

説明

阿部健一東北大学, 大学院・工学研究科, 教授 (70005403)