Self-Organization of Hierarchical Reinforcement Learning System

Research Project

Project/Area Number	13650480
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Control engineering
Research Institution	Tohoku University
Principal Investigator	ABE Kenichi Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)
Co-Investigator(Kenkyū-buntansha)	TANAKA Akira Tohoku University, Graduate School of Engineering, Research Associate, 大学院・工学研究科, 助手 (10323057)
Project Period (FY)	2001 – 2002
Project Status	Completed (Fiscal Year 2002)
Budget Amount *help	¥3,400,000 (Direct Cost: ¥3,400,000) Fiscal Year 2002: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2001: ¥2,600,000 (Direct Cost: ¥2,600,000)
Keywords	Reinforcement Learning / Partially Observable Markovian Environment / Q-Learning / Hierarchical Q-Learning / Learning Automaton / Swiching Q-Learning / Labeling Q-Learning / Neural Network / 段階型Q学習 / リカレントニューラルネットワーク
Research Abstract	Previously, we proposed two learning algorithms, Labeling Q-learning(LQ-learning) and Switching Q-learning(SQ-learning). Although the former is the algorithm of simple structure which consists of a single agent, it can learn well in a certain kind of POMDP environments. The latter is a type of hierarchical Q-learning method (HQ-learning), which changes Q-modules by using a hierarchical learning automaton, and can work well also in a more complicated POMDP environment. In this study, we improved these two algorithms, and developed more effective HQ-learning algorithms. Further, in order to overcome more realistic environments where either or both of observations and actions take continuous values, we conducted a basic study about function approximations by neural networks. The results are following. 1) We improved the SQ-learning so that it works well in noisy environments. We also demonstrated that the SQ-learning exhibits a better performance than Wiering's HQ-learning. 2) We enhanced the performance of the LQ-leaning by introducing the Kohonen's self-organizing map(SOM). 3) We improved the self-segmentation of sequence(SSS) algorithm by Sun and Sessions. Further, we also developed a new algorithm, called SSS(λ). 4) We examined the effectiveness of SSS(λ) by applying it to the navigation task of a mobile robot. Here, the SOM was used for self-classification of continuous sonar observations. 5) We proposed a statistical approximation learning(SAL) for the simultaneous recurrent neural networks, and demonstrated that it achieves the high accuracy of nonlinear function approximation. Further, we presented a novel neural network model for incremental learning.

Report

(3 results)

2002 Annual Research Report Final Research Report Summary
2001 Annual Research Report

Research Products
(42 results)

All Other

All Publications (42 results)

[Publications] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing. 55-62 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc.of the International Conference on Control, Automation and Systems. 5-8 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc.of the International Conference on Control, Automation and Systems. 292-295 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc.of 7^<th> Int.Symp.on Artificial Life and Robotics(AROB7^<th>). Vol.1. 16-18 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc.of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会諭文誌C. Vol.122-C, No.7. 1186-1193 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS.on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H.Y.Lee: "Performance of LQ-learning in POMDP Environments"Proc.of SICE Annual Conference 2002. 922-925 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M.Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc.of SICE Annual Conference 2002. 2913-2918 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] N.Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc.of SICE Annual Conference 2002. 2903-2908 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H.Y.Lee: "Labeling Q-learning with SOM"Int.Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H.Y.Lee: "LQ-learning with self-organizing map for POMDP environments"Proc.of 8^<th> Int.Symp.on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/ Distributed Computing. 55-62 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Y. Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M. Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M. Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of 7th Int. Symp. on Artificial Life and Robotics(AROB7th ). 1. 16-18 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M. Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. 4, No. 2. 124-129 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M. Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15th IFAC World Congress on Automatic Control. 2491-2496 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Kamaya: "Hierarchical Reinforcement Learning in Partially Observable Markovian Environments-A Proposal of Switching Q-learning"Trans. IEE of Japan. 122-C, No.7. 1186-1193 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Y. Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. E85-D, No. 9. 1425-1432 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Y. Lee: "Performance of LQ-learning in POMDP Environments"Proc. of SICE Annual Conference 2002. 922-925 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M. Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc. of SICE Annual Conference 2002. 2913-2918 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] N. Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc. of SICE Annual Conference 2002. 2903-2908 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Y. Lee: "Labeling Q-learning with SOM"Int. Conf. on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] H. Y. Lee: "LQ-learning with self-organizing map for POMDP environments"Proc. of 8th Int. Symp. on Artificial Life and Robotics(AR0B8th ). 1. 345-348 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)
- Related Report
  2002 Annual Research Report
[Publications] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)
- Related Report
  2002 Annual Research Report
[Publications] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会論文誌C. Vol.122-C, No.7. 1186-1193 (2002)
- Related Report
  2002 Annual Research Report
[Publications] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2002)
- Related Report
  2002 Annual Research Report
[Publications] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4^<th> Asian Control Conference (ASCC 2002). 408-413 (2002)
- Related Report
  2002 Annual Research Report
[Publications] H.Y.Lee: "Labeling Q-learning with SOM"Int. Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)
- Related Report
  2002 Annual Research Report
[Publications] H.Y.Lee: "Labeling Q-learning with self-organizing map for POMDP environments"Proc. of 8^<th> Int. Symp. on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)
- Related Report
  2002 Annual Research Report
[Publications] N.Honnma: "Stochastic Analysis of Chaos Dynamic in Recurrent Neural Networks"Pro. of IFSA/NAFIPS 2001. 298-303 (2001)
- Related Report
  2001 Annual Research Report
[Publications] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel / Distributed Computing. 55-62 (2001)
- Related Report
  2001 Annual Research Report
[Publications] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)
- Related Report
  2001 Annual Research Report
[Publications] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)
- Related Report
  2001 Annual Research Report
[Publications] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of AROB 7^<th> 2002. Vol.1. 16-18 (2002)
- Related Report
  2001 Annual Research Report
[Publications] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"Trans. on Control, Automation and Systems Engineering (ICASE). (In press). (2002)
- Related Report
  2001 Annual Research Report
[Publications] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会論文誌C. (印刷中). (2002)
- Related Report
  2001 Annual Research Report

Self-Organization of Hierarchical Reinforcement Learning System

Principal Investigator

ABE Kenichi Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

¥3,400,000 (Direct Cost: ¥3,400,000)

Report

Research Products

[Publications] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing. 55-62 (2001)

Description

Related Report

[Publications] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc.of the International Conference on Control, Automation and Systems. 5-8 (2001)

Description

Related Report

[Publications] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc.of the International Conference on Control, Automation and Systems. 292-295 (2001)

Description

Related Report

[Publications] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc.of 7^<th> Int.Symp.on Artificial Life and Robotics(AROB7^<th>). Vol.1. 16-18 (2002)

Description

Related Report

[Publications] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)

Description

Related Report

[Publications] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc.of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)

Description

Related Report

[Publications] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会諭文誌C. Vol.122-C, No.7. 1186-1193 (2002)

Description

Related Report

[Publications] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS.on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2003)

Description

Related Report

[Publications] H.Y.Lee: "Performance of LQ-learning in POMDP Environments"Proc.of SICE Annual Conference 2002. 922-925 (2002)

Description

Related Report

[Publications] M.Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc.of SICE Annual Conference 2002. 2913-2918 (2002)

Description

Related Report

[Publications] N.Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc.of SICE Annual Conference 2002. 2903-2908 (2002)

Description

Related Report

[Publications] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)

Description

Related Report

[Publications] H.Y.Lee: "Labeling Q-learning with SOM"Int.Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)

Description

Related Report

[Publications] H.Y.Lee: "LQ-learning with self-organizing map for POMDP environments"Proc.of 8^<th> Int.Symp.on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)

Description

Related Report

[Publications] H. Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/ Distributed Computing. 55-62 (2001)

Description

Related Report

[Publications] H. Y. Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)

Description

Related Report

[Publications] M. Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)

Description

Related Report

[Publications] M. Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of 7th Int. Symp. on Artificial Life and Robotics(AROB7th ). 1. 16-18 (2002)

Description

Related Report

[Publications] M. Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. 4, No. 2. 124-129 (2002)

Description

Related Report

[Publications] M. Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15th IFAC World Congress on Automatic Control. 2491-2496 (2002)

Description

Related Report

[Publications] H. Kamaya: "Hierarchical Reinforcement Learning in Partially Observable Markovian Environments-A Proposal of Switching Q-learning"Trans. IEE of Japan. 122-C, No.7. 1186-1193 (2002)

Description

Related Report

[Publications] H. Y. Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. E85-D, No. 9. 1425-1432 (2002)

Description

Related Report

[Publications] H. Y. Lee: "Performance of LQ-learning in POMDP Environments"Proc. of SICE Annual Conference 2002. 922-925 (2002)

Description

Related Report

[Publications] M. Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc. of SICE Annual Conference 2002. 2913-2918 (2002)

Description

Related Report

[Publications] N. Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc. of SICE Annual Conference 2002. 2903-2908 (2002)

Description