Study on Decentralized Learning Algorithms in Non-Markovian Environments

Research Project

Project/Area Number	09650451
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	計測・制御工学
Research Institution	Tohoku University
Principal Investigator	ABE Kenichi Tohoku University, School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)
Co-Investigator(Kenkyū-buntansha)	YOSHIZAWA Makoto Tohoku University, School of Engineering, Associate Professor, 大学院・工学研究科, 助教授 (60166931)
Project Period (FY)	1997 – 1998
Project Status	Completed (Fiscal Year 1998)
Budget Amount *help	¥2,500,000 (Direct Cost: ¥2,500,000) Fiscal Year 1998: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 1997: ¥1,900,000 (Direct Cost: ¥1,900,000)
Keywords	HIDDEN MARKOV / PARTIALLY OBSERVABLE MARKOV DECISION PROCESS / REINFORCEMENT LEARNING / Q-LEARNING / LABELING Q-LEARNING / LEARING AUTOMATON / NEURAL NETWORKS / DECENTRALIZED LEARNING / 隠れマルコフモデル
Research Abstract	The results of this study are summarized as follows : (1) A formal model of non-Markovian problems is the partially observable Markov decision problem (POMDP). The most useful solution to overcome partial observability is to use memory to estimate state. In this study, we proposed a new memory architecture of reinforcement learning algorithms to solve certain type of POMDPs. The agent's task is to discover a path leading from start position to goal in a partially observable maze. The agent is assumed to have life-time separable into "trials". The basic framework of the algorithm, called labeling Q-learning, is described as follows. Let 0 be the set of finite observations. At each step t, when the agent gets an observation o_t epsilon OMICRON from the environment, a label, theta_t is attached to the observation, where theta_t is an element of THETA={0, 1, 2, ・, M -1}, (in the beginning of each trial, the labels for all omicron_t epsilon OMICRON are initialized to 0).Then the pair OMICRON_t=(OMICRON_tTHETA_t) defines a new observation, and the usual reinforcementlearning algorithm TD( lambda) that uses replacing traces is applied to OMICRON=OMICRONTHETA, as if the pair = (omicron_t, theta_t) has the Markov property. (2) The labeling Q-learning was applied to test problems of simple mazes taken from the recent literature. The results demonstrated labeling Q-learning's ability to work well in near-optimal manner. (3) Most problems will have continuous or large discrete observation space. We studied generalization techniques by recurrentneural networks(RNN) and holon networks, which allow compact storage of similar observations. Further, we developed an approximate method of controlling the complexity, i.e., the Lyapunov exponent, of RNNs, and the method was demonstrated by applying it to identification problems of certain nonlinear systems. (4) We made fundamental experiments on sensor-based navigation for a mobile robot.

Report

(3 results)

1998 Annual Research Report Final Research Report Summary
1997 Annual Research Report

Research Products
(22 results)

All Other

All Publications (22 results)

[Publications] 喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33. 1093-1098 (1997)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. 91. 43-61 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrent neural networks" Artificial Life and Robotics. 2. 97-101 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] 本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35. 138-143 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative memory" Neural Computation. In press. (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Ken Kitagawa: "An Emergent Learning Method for Recurrent Neural Network" Transactions of The Society of Instrument and Control Engineers. Vol.33. 1093-1098 (1997)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. Vol.91. 43-61 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrentneural networks" Artificial Life and Robotics. Vol.2. 97-101 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Noriyasu Honma: "Complexity Control Methods of Dynamics in Recurrent Neural Networks" Transactions of The Society of Instrument and Control Engineers. Vol.35. 138-143 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative" Neural Computation. (in press).
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] 喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33・11. 1093-1098 (1997)
- Related Report
  1998 Annual Research Report
[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)
- Related Report
  1998 Annual Research Report
[Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. 91・1. 43-61 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrent neural networks" Artificial Life and Robotics. 2・3. 97-101 (1998)
- Related Report
  1998 Annual Research Report
[Publications] 本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35・1. 138-143 (1999)
- Related Report
  1998 Annual Research Report
[Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative memory" Neural Computation. In press. (1999)
- Related Report
  1998 Annual Research Report
[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)
- Related Report
  1997 Annual Research Report
[Publications] N.Honma: "An Autonomous Criterion of Learning Methods for Recurrent Neural Networks" Proc.of the 2nd Asian Control Conference. II. 219-222 (1997)
- Related Report
  1997 Annual Research Report
[Publications] 喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33巻11号. 1093-1098 (1997)
- Related Report
  1997 Annual Research Report
[Publications] N.Honma: "A learning method for large-scale recurrent neural networks" Proc.of The 3rd International Symposium on ARTIFICIAI LIFE AND ROBOTICS. 358-361 (1998)
- Related Report
  1997 Annual Research Report

Study on Decentralized Learning Algorithms in Non-Markovian Environments

Principal Investigator

ABE Kenichi Tohoku University, School of Engineering, Professor, 大学院・工学研究科, 教授 (70005403)

¥2,500,000 (Direct Cost: ¥2,500,000)

Report

Research Products

[Publications] 喜多川 健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33. 1093-1098 (1997)

Description

Related Report

[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

Description

Related Report

[Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. 91. 43-61 (1998)

Description

Related Report

[Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrent neural networks" Artificial Life and Robotics. 2. 97-101 (1998)

Description

Related Report

[Publications] 本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35. 138-143 (1999)

Description

Related Report

[Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative memory" Neural Computation. In press. (1999)

Description

Related Report

[Publications] Ken Kitagawa: "An Emergent Learning Method for Recurrent Neural Network" Transactions of The Society of Instrument and Control Engineers. Vol.33. 1093-1098 (1997)

Description

Related Report

[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

Description

Related Report

[Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. Vol.91. 43-61 (1998)

Description

Related Report

[Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrentneural networks" Artificial Life and Robotics. Vol.2. 97-101 (1998)

Description

Related Report

[Publications] Noriyasu Honma: "Complexity Control Methods of Dynamics in Recurrent Neural Networks" Transactions of The Society of Instrument and Control Engineers. Vol.35. 138-143 (1999)

Description

Related Report

[Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative" Neural Computation. (in press).

Description

Related Report

[Publications] 喜多川 健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33・11. 1093-1098 (1997)

Related Report

[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

Related Report

[Publications] Noriyasu Honma: "Adaptive evolution of holon networks by an autonomous decentralized method" Applied Mathematics and Computation. 91・1. 43-61 (1998)

Related Report

[Publications] Noriyasu Honma: "Effect of complexity on learning ability of recurrent neural networks" Artificial Life and Robotics. 2・3. 97-101 (1998)

Related Report

[Publications] 本間経康: "神経回路網ダイナミクスの複雑さの制御法" 計測自動制御学会論文集. 35・1. 138-143 (1999)

Related Report

[Publications] Fation Sevrani: "On the synthesis of brain-state-in-a-box neural models with application to associative memory" Neural Computation. In press. (1999)

Related Report

[Publications] Zhao Feng-ji: "A Mobile Robot Localization Using Ultrasonic Sensors in Indoor Environment" Proc.of International Workshop on Robot and Human Communication. 52-57 (1997)

Related Report

[Publications] N.Honma: "An Autonomous Criterion of Learning Methods for Recurrent Neural Networks" Proc.of the 2nd Asian Control Conference. II. 219-222 (1997)

Related Report

[Publications] 喜多川 健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33巻11号. 1093-1098 (1997)

Related Report

[Publications] N.Honma: "A learning method for large-scale recurrent neural networks" Proc.of The 3rd International Symposium on ARTIFICIAI LIFE AND ROBOTICS. 358-361 (1998)

Related Report

[Publications] 喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33. 1093-1098 (1997)

[Publications] 喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33・11. 1093-1098 (1997)

[Publications] 喜多川健: "リカレントニューラルネットワークの創発的学習手法" 計測自動制御学会論文集. 33巻11号. 1093-1098 (1997)