• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

1998 Fiscal Year Final Research Report Summary

Study on Decentralized Learning Algorithms in Non-Markovian Environments

Research Project

Project/Area Number 09650451
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field 計測・制御工学
Research InstitutionTohoku University

Principal Investigator

ABE Kenichi  Tohoku University, School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

Co-Investigator(Kenkyū-buntansha) YOSHIZAWA Makoto  Tohoku University, School of Engineering, Associate Professor, 大学院・工学研究科, 助教授 (60166931)
Project Period (FY) 1997 – 1998
KeywordsHIDDEN MARKOV / PARTIALLY OBSERVABLE MARKOV DECISION PROCESS / REINFORCEMENT LEARNING / Q-LEARNING / LABELING Q-LEARNING / LEARING AUTOMATON / NEURAL NETWORKS / DECENTRALIZED LEARNING
Research Abstract

The results of this study are summarized as follows :
(1) A formal model of non-Markovian problems is the partially observable Markov decision problem (POMDP). The most useful solution to overcome partial observability is to use memory to estimate state. In this study, we proposed a new memory architecture of reinforcement learning algorithms to solve certain type of POMDPs.
The agent's task is to discover a path leading from start position to goal in a partially observable maze. The agent is assumed to have life-time separable into "trials". The basic framework of the algorithm, called labeling Q-learning, is described as follows.
Let 0 be the set of finite observations. At each step t, when the agent gets an observation o_t epsilon OMICRON from the environment, a label, theta_t is attached to the observation, where theta_t is an element of THETA={0, 1, 2, ・, M -1}, (in the beginning of each trial, the labels for all omicron_t epsilon OMICRON are initialized to 0).Then the pair OMICRON_t=(OMICRON_t*THETA_t) defines a new observation, and the usual reinforcementlearning algorithm TD( lambda) that uses replacing traces is applied to OMICRON=OMICRON*THETA, as if the pair = (omicron_t, theta_t) has the Markov property.
(2) The labeling Q-learning was applied to test problems of simple mazes taken from the recent literature. The results demonstrated labeling Q-learning's ability to work well in near-optimal manner.
(3) Most problems will have continuous or large discrete observation space. We studied generalization techniques by recurrentneural networks(RNN) and holon networks, which allow compact storage of similar observations. Further, we developed an approximate method of controlling the complexity, i.e., the Lyapunov exponent, of RNNs, and the method was demonstrated by applying it to identification problems of certain nonlinear systems.
(4) We made fundamental experiments on sensor-based navigation for a mobile robot.

  • Research Products

    (12 results)

All Other

All Publications (12 results)

  • [Publications] 喜多川 健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33. 1093-1098 (1997)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. 91. 43-61 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrent neural networks" Artificial Life and Robotics. 2. 97-101 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35. 138-143 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative memory" Neural Computation. In press. (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Ken Kitagawa: "An Emergent Learning Method for Recurrent Neural Network" Transactions of The Society of Instrument and Control Engineers. Vol.33. 1093-1098 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. Vol.91. 43-61 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrentneural networks" Artificial Life and Robotics. Vol.2. 97-101 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Noriyasu Honma: "Complexity Control Methods of Dynamics in Recurrent Neural Networks" Transactions of The Society of Instrument and Control Engineers. Vol.35. 138-143 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative" Neural Computation. (in press).

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 1999-12-08  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi