• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Self-Organization of Hierarchical Reinforcement Learning System

Research Project

Project/Area Number 13650480
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Control engineering
Research InstitutionTohoku University

Principal Investigator

ABE Kenichi  Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

Co-Investigator(Kenkyū-buntansha) TANAKA Akira  Tohoku University, Graduate School of Engineering, Research Associate, 大学院・工学研究科, 助手 (10323057)
Project Period (FY) 2001 – 2002
Project Status Completed (Fiscal Year 2002)
Budget Amount *help
¥3,400,000 (Direct Cost: ¥3,400,000)
Fiscal Year 2002: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2001: ¥2,600,000 (Direct Cost: ¥2,600,000)
KeywordsReinforcement Learning / Partially Observable Markovian Environment / Q-Learning / Hierarchical Q-Learning / Learning Automaton / Swiching Q-Learning / Labeling Q-Learning / Neural Network / 段階型Q学習 / リカレントニューラルネットワーク
Research Abstract

Previously, we proposed two learning algorithms, Labeling Q-learning(LQ-learning) and Switching Q-learning(SQ-learning). Although the former is the algorithm of simple structure which consists of a single agent, it can learn well in a certain kind of POMDP environments. The latter is a type of hierarchical Q-learning method (HQ-learning), which changes Q-modules by using a hierarchical learning automaton, and can work well also in a more complicated POMDP environment. In this study, we improved these two algorithms, and developed more effective HQ-learning algorithms. Further, in order to overcome more realistic environments where either or both of observations and actions take continuous values, we conducted a basic study about function approximations by neural networks. The results are following.
1) We improved the SQ-learning so that it works well in noisy environments. We also demonstrated that the SQ-learning exhibits a better performance than Wiering's HQ-learning.
2) We enhanced the performance of the LQ-leaning by introducing the Kohonen's self-organizing map(SOM).
3) We improved the self-segmentation of sequence(SSS) algorithm by Sun and Sessions. Further, we also developed a new algorithm, called SSS(λ).
4) We examined the effectiveness of SSS(λ) by applying it to the navigation task of a mobile robot. Here, the SOM was used for self-classification of continuous sonar observations.
5) We proposed a statistical approximation learning(SAL) for the simultaneous recurrent neural networks, and demonstrated that it achieves the high accuracy of nonlinear function approximation. Further, we presented a novel neural network model for incremental learning.

Report

(3 results)
  • 2002 Annual Research Report   Final Research Report Summary
  • 2001 Annual Research Report
  • Research Products

    (42 results)

All Other

All Publications (42 results)

  • [Publications] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/Distributed Computing. 55-62 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc.of the International Conference on Control, Automation and Systems. 5-8 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc.of the International Conference on Control, Automation and Systems. 292-295 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc.of 7^<th> Int.Symp.on Artificial Life and Robotics(AROB7^<th>). Vol.1. 16-18 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc.of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会諭文誌C. Vol.122-C, No.7. 1186-1193 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS.on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2003)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H.Y.Lee: "Performance of LQ-learning in POMDP Environments"Proc.of SICE Annual Conference 2002. 922-925 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M.Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc.of SICE Annual Conference 2002. 2913-2918 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] N.Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc.of SICE Annual Conference 2002. 2903-2908 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H.Y.Lee: "Labeling Q-learning with SOM"Int.Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H.Y.Lee: "LQ-learning with self-organizing map for POMDP environments"Proc.of 8^<th> Int.Symp.on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel/ Distributed Computing. 55-62 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Y. Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M. Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M. Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of 7th Int. Symp. on Artificial Life and Robotics(AROB7th ). 1. 16-18 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M. Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. 4, No. 2. 124-129 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M. Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15th IFAC World Congress on Automatic Control. 2491-2496 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Kamaya: "Hierarchical Reinforcement Learning in Partially Observable Markovian Environments-A Proposal of Switching Q-learning"Trans. IEE of Japan. 122-C, No.7. 1186-1193 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Y. Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. E85-D, No. 9. 1425-1432 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Y. Lee: "Performance of LQ-learning in POMDP Environments"Proc. of SICE Annual Conference 2002. 922-925 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M. Sakai: "Statistical Learning Method of Discontinuous Functions using Simultaneous Recurrent Networks"Proc. of SICE Annual Conference 2002. 2913-2918 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] N. Honnma: "Superimposing Memory by Dynamic and Spatial Changing Synaptic Weights"Proc. of SICE Annual Conference 2002. 2903-2908 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4th Asian Control Conference (ASCC 2002). 408-413 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Y. Lee: "Labeling Q-learning with SOM"Int. Conf. on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] H. Y. Lee: "LQ-learning with self-organizing map for POMDP environments"Proc. of 8th Int. Symp. on Artificial Life and Robotics(AR0B8th ). 1. 345-348 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"The Institute of Control, Automation and Systems Engineering. Vol.4, No.2. 124-129 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] M.Sakai: "A Statistical Approximation Learning Method for Simultaneous Recurrent Networks"Proc. of the 15^<th> IFAC World Congress on Automatic Control. 2491-2496 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会論文誌C. Vol.122-C, No.7. 1186-1193 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] H.Y.Lee: "Labeling Q-Learning in POMDP Environments"IEICE TRANS. on Information and Systems. Vol.E85-D, No.9. 1425-1432 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] H.Kamaya: "Self-Segmentation of Sequences Algorithm with Eligibility Traces in POMDPs"Proceeding of the 4^<th> Asian Control Conference (ASCC 2002). 408-413 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] H.Y.Lee: "Labeling Q-learning with SOM"Int. Conf.on Control, Automation, and Systems(ICCAS 2002). 105-109 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] H.Y.Lee: "Labeling Q-learning with self-organizing map for POMDP environments"Proc. of 8^<th> Int. Symp. on Artificial Life and Robotics(AROB8^<th>). Vol.1. 345-348 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] N.Honnma: "Stochastic Analysis of Chaos Dynamic in Recurrent Neural Networks"Pro. of IFSA/NAFIPS 2001. 298-303 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] H.Kamaya: "Hierarchical Self-Segmentation Algorithms for Q-learning in Non-Markovian Environments"2nd International Conference on Software Engineering, Artificial Intelligence, Networking & Parallel / Distributed Computing. 55-62 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] H.Y.Lee: "Flexible Labeling Mechanism in LQ-learning for Maze Problems"Proc. of the International Conference on Control, Automation and Systems. 5-8 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] M.Sakai: "Control of Chaos Dynamics in Jordan Recurrent Neural Networks"Proc. of the International Conference on Control, Automation and Systems. 292-295 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] M.Sakai: "Learning method by a statistical approximation for simultaneous recurrent networks"Proc. of AROB 7^<th> 2002. Vol.1. 16-18 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] M.Sakai: "Complexity Control Method of Chaos Dynamics in Recurrent Neural Networks"Trans. on Control, Automation and Systems Engineering (ICASE). (In press). (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 釜谷博行: "部分観測マルコフ環境における階層型強化学習-スイッチングQ-学習の提案"電気学会論文誌C. (印刷中). (2002)

    • Related Report
      2001 Annual Research Report

URL: 

Published: 2001-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi