• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Study on Decentralized Learning Algorithms in Non-Markovian Environments

Research Project

Project/Area Number 09650451
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field 計測・制御工学
Research InstitutionTohoku University

Principal Investigator

ABE Kenichi  Tohoku University, School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

Co-Investigator(Kenkyū-buntansha) YOSHIZAWA Makoto  Tohoku University, School of Engineering, Associate Professor, 大学院・工学研究科, 助教授 (60166931)
Project Period (FY) 1997 – 1998
Project Status Completed (Fiscal Year 1998)
Budget Amount *help
¥2,500,000 (Direct Cost: ¥2,500,000)
Fiscal Year 1998: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 1997: ¥1,900,000 (Direct Cost: ¥1,900,000)
KeywordsHIDDEN MARKOV / PARTIALLY OBSERVABLE MARKOV DECISION PROCESS / REINFORCEMENT LEARNING / Q-LEARNING / LABELING Q-LEARNING / LEARING AUTOMATON / NEURAL NETWORKS / DECENTRALIZED LEARNING / 隠れマルコフモデル
Research Abstract

The results of this study are summarized as follows :
(1) A formal model of non-Markovian problems is the partially observable Markov decision problem (POMDP). The most useful solution to overcome partial observability is to use memory to estimate state. In this study, we proposed a new memory architecture of reinforcement learning algorithms to solve certain type of POMDPs.
The agent's task is to discover a path leading from start position to goal in a partially observable maze. The agent is assumed to have life-time separable into "trials". The basic framework of the algorithm, called labeling Q-learning, is described as follows.
Let 0 be the set of finite observations. At each step t, when the agent gets an observation o_t epsilon OMICRON from the environment, a label, theta_t is attached to the observation, where theta_t is an element of THETA={0, 1, 2, ・, M -1}, (in the beginning of each trial, the labels for all omicron_t epsilon OMICRON are initialized to 0).Then the pair OMICRON_t=(OMICRON_t*THETA_t) defines a new observation, and the usual reinforcementlearning algorithm TD( lambda) that uses replacing traces is applied to OMICRON=OMICRON*THETA, as if the pair = (omicron_t, theta_t) has the Markov property.
(2) The labeling Q-learning was applied to test problems of simple mazes taken from the recent literature. The results demonstrated labeling Q-learning's ability to work well in near-optimal manner.
(3) Most problems will have continuous or large discrete observation space. We studied generalization techniques by recurrentneural networks(RNN) and holon networks, which allow compact storage of similar observations. Further, we developed an approximate method of controlling the complexity, i.e., the Lyapunov exponent, of RNNs, and the method was demonstrated by applying it to identification problems of certain nonlinear systems.
(4) We made fundamental experiments on sensor-based navigation for a mobile robot.

Report

(3 results)
  • 1998 Annual Research Report   Final Research Report Summary
  • 1997 Annual Research Report
  • Research Products

    (22 results)

All Other

All Publications (22 results)

  • [Publications] 喜多川 健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33. 1093-1098 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. 91. 43-61 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrent neural networks" Artificial Life and Robotics. 2. 97-101 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] 本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35. 138-143 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative memory" Neural Computation. In press. (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Ken Kitagawa: "An Emergent Learning Method for Recurrent Neural Network" Transactions of The Society of Instrument and Control Engineers. Vol.33. 1093-1098 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. Vol.91. 43-61 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrentneural networks" Artificial Life and Robotics. Vol.2. 97-101 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Noriyasu Honma: "Complexity Control Methods of Dynamics in Recurrent Neural Networks" Transactions of The Society of Instrument and Control Engineers. Vol.35. 138-143 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative" Neural Computation. (in press).

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] 喜多川 健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33・11. 1093-1098 (1997)

    • Related Report
      1998 Annual Research Report
  • [Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

    • Related Report
      1998 Annual Research Report
  • [Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. 91・1. 43-61 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrent neural networks" Artificial Life and Robotics. 2・3. 97-101 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] 本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35・1. 138-143 (1999)

    • Related Report
      1998 Annual Research Report
  • [Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative memory" Neural Computation. In press. (1999)

    • Related Report
      1998 Annual Research Report
  • [Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] N.Honma: "An Autonomous Criterion of Learning Methods for Recurrent Neural Networks" Proc.of the 2nd Asian Control Conference. II. 219-222 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 喜多川 健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33巻11号. 1093-1098 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] N.Honma: "A learning method for large-scale recurrent neural networks" Proc.of The 3rd International Symposium on ARTIFICIAI LIFE AND ROBOTICS. 358-361 (1998)

    • Related Report
      1997 Annual Research Report

URL: 

Published: 1997-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi