• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Self-control of Memory Structure of Reinforcement Learning in Hidden Markov Environments

Research Project

Project/Area Number 11650441
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Control engineering
Research InstitutionTohoku University

Principal Investigator

ABE Kenichi  Tohoku University, Graduate School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

Co-Investigator(Kenkyū-buntansha) HONMA Noriyasu  Tohoku University, College of Medical Sciences, Associate Professor, 医療技術短期大学部, 助教授 (30282023)
Project Period (FY) 1999 – 2000
Project Status Completed (Fiscal Year 2000)
Budget Amount *help
¥3,500,000 (Direct Cost: ¥3,500,000)
Fiscal Year 2000: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 1999: ¥2,700,000 (Direct Cost: ¥2,700,000)
KeywordsHidden MARKOV / Reinforcement Learning / Q-Learning / Labeling Q-learning / Learning Automaton / Swiching Q-Learning / Hierarchical Q-Learning / スイッチングQ-学習
Research Abstract

Recent research on reinforcement learning (RL) algorithms has concentrated on partially observable Markov decision problems (POMDPs). A possible solution to POMDPs is to use history information to estimate state. Q values must be updated in the form reflecting past history of observation/action pairs. In this study, we developed two methods of reinforcement learning, which can solve certain types of POMDPs. The results are summarized as follows :
(1) As a result of last Grant-in-Aid for Scientific Research (C)(2), we proposed Labeling Q-learning (LQ-learning), which has a new memory architecture of handling past history. In this study, we established a general framework of the LQ-learning. Various algorithms in this framework were devised, and we gave comparative study between these through simulation. The above LQ-learning, however, has the drawback that we must predefine the labeling mechanism. To overcome this drawback, we further devised a SOM (self-organizing feature map) approach of labeling, in which past history of observation/action pairs are partitioned into classes. The SOM has one-dimensional structure and the output nodes of the SOM produce labels.
(2) We proposed a new type of hierarchical RL, called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by memoryless policies, without any other information leading to "good" subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q-modules in which each module discovers a local control policy. SQ-learning uses a hierarchical system of learning automata for switching module. The simulation results demonstrate that SQ-learning has the ability to quickly learn optimal or near-optimal policies without huge computational burden.
It is a future work to build a unified view by which LQ-learning and SQ-learning can be dealt with systematically.

Report

(3 results)
  • 2000 Annual Research Report   Final Research Report Summary
  • 1999 Annual Research Report
  • Research Products

    (45 results)

All Other

All Publications (45 results)

  • [Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Alireza Fatehi and Kenichi Abe: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Methods of Chaos Dynamics in Recurrent Neural Networks"Trans.of The Society of Instrument and Control Engineers. (in press).

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Masao Sakai: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. 281-284 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Haeyon Lee: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc.of Fifth Int.Symp.on Artificial Life and Robtics. 484-490 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] K.Sugawara: "Collective Behabior of Multi-agent System with Simple Interaction"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. 725-727 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Haeyon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 281-284 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Haeyon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robtics. Vol.1. 208-211 (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robtics. Vol.2. 478-481 (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"1999 IEEE International Conference on SMC. Vol. I. 484-489 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 釜谷博行: "隠れマルコフ環境におけるスイッチングQ-学習"計測自動制御学会東北支部35周年記念講演会予稿集. 7-8 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] Alireza Fatehi: "PLANT IDENTIFICATION BY SOM NEURAL NETWORKS"ECC'99. Time ID:BP-3. Paper ID:F190. (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] HaeYeon Lee: "Labeling Q-Learning For Non-Markovian Environments"1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] HaeYeon Lee: "Labeling Q-learning for partially observable markov decision process environments"AROB 5th '00. Vol.2. 281-284 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] Masao Sakai: "Complexity control method by a stochastic analysis for recurrent neural networks"AROB 5th '00. Vol.1. 484-487 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] Ikuo Yoshihara: "Extending prediction term of GP-based time series model"AROB 5th '00. Vol.1. 268-271 (2000)

    • Related Report
      1999 Annual Research Report

URL: 

Published: 1999-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi