2000 年度研究成果報告書概要

非マルコフ環境における強化学習のメモリ機構の自律的生成に関する研究

研究課題

研究課題/領域番号	11650441
研究種目	基盤研究(C)
配分区分	補助金
応募区分	一般
研究分野	制御工学
研究機関	東北大学
研究代表者	阿部健一東北大学, 大学院・工学研究科, 教授 (70005403)
研究分担者	本間経康東北大学, 医療技術短期大学部, 助教授 (30282023)
研究期間 (年度)	1999 – 2000
キーワード	隠れマルコフ環境 / 強化学習 / Q-学習 / ラベリングQ-学習 / 学習オートマトン / マイッチングQ-学習 / 階層型Q-学習
研究概要	強化学習(RL:Reinforcement Learning)を部分観測マルコフ(あるいは隠れマルコフ)環境へ適用する場合、環境状態が直接観測できないので、通常のRLに何らかの形でメモリを付加し、過去の観測/行動対の履歴を反映した形でQ値表(Qモジュール)を更新しなければならない。本研究では、ラベリングQ学習(LQ-learning)とスイッチングQ学習(SQ-learning)とを提案し、それらの性能について検討を進め下記の成果を得た。 (1)LQ学習は前回の基盤研究(C)(2)で提案したアルゴリズムである。このアルゴリズムは、観測値に付けるラベルをいつどのような状況(過去の観測系列)で更新するか、ラベルの値をどのように決めるか、の2面において多様な選択肢がある。そこで、LQ学習のより一般的な枠組みを定式化し、その枠組みの中で種々のアルゴリズムを考案し、それらの有効性をシミュレーションにより比較検討した。しかし、このLQ学習では、ラベル機構をあらかじめ設定する必要がある。そこで、その自動化を行なうため、自己組織化マップ(SOM)によりラベリングする方法を提案した。SOMは1次元構造のものを用い、その出力としてラベルが得られる。 (2)スイッチングQ学習(SQ学習)とよぶ、一種の階層形強化学習法を提案した。部分観測マルコフ環境を局所的にはマルコフ環境と見なせる観測空間の部分空間に各Qモジュールを対応させ、環境の全観測空間をパッチワーク状に覆う、ということに基づいた方法である。SQ学習では、Qモジュールをある特徴的な観測値(サブゴール)で切り替える。このサブゴール系列の学習を階層形学習オートマトンによって行う。いわば、メモリ機構の自律的生成を目指した方法である。このアルゴリズムが、LQ学習に比べより規模の大きい問題例において有効に動作することをシミュレーションにより確認した。なお、LQ学習とSQ学習を統一的に取り扱える強化学習機構を構築することが今後の課題である。

研究成果
(28件)

すべてその他

すべて文献書誌 (28件)

[文献書誌] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)
- 説明
  「研究成果報告書概要(和文)」より
[文献書誌] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Alireza Fatehi and Kenichi Abe: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
[文献書誌] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Methods of Chaos Dynamics in Recurrent Neural Networks"Trans.of The Society of Instrument and Control Engineers. (in press).
- 説明
  「研究成果報告書概要(欧文)」より

2000 年度 研究成果報告書概要

非マルコフ環境における強化学習のメモリ機構の自律的生成に関する研究

研究代表者

阿部 健一 東北大学, 大学院・工学研究科, 教授 (70005403)

研究成果

[文献書誌] Alireza Fatehi: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

説明

[文献書誌] Hae Yeon Lee: "Labeling Q-Learning For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

説明

[文献書誌] Masao Sakai: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

説明

[文献書誌] Noriyasu Honma: "Auto-Learning by Dynamical Recognition Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.III. 211-216 (1999)

説明

[文献書誌] Fation Sevrani: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

説明

[文献書誌] Masao Sakai: "Complexity control method by stochastic analysis for recurrent neural networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

説明

[文献書誌] Hae Yeon Lee: "Labeling Q-learning for partially observable markov decision process environments"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

説明

[文献書誌] Hae Yeon Lee: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15th Korea Automatic Control Conference. Vol.2. 484-487 (2000)

説明

[文献書誌] Masao Sakai: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15th Korea Automatic Control Conference. Vol.1. 494-497 (2000)

説明

[文献書誌] Hiroyuki Kamaya: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

説明

[文献書誌] Hae Yeon Lee: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

説明

[文献書誌] Masao Sakai: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

説明

[文献書誌] Alireza Fatehi: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

説明

[文献書誌] 酒井正夫: "神経回路網におけるカオスダイナミクスの制御"計測自動制御学会論文集. (印刷中). (2001)

説明

[文献書誌] Alireza Fatehi and Kenichi Abe: "Convergence of SOM Multiple Models Identifier"Proc.of 1999 IEEE International Conference on SMC. Vol.IV. 1074-1077 (1999)

説明

[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling For Non-Markovian Environments"Proc.of 1999 IEEE International Conference on SMC. Vol.V. 487-491 (1999)

説明

[文献書誌] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Complexity Control Method for Recurrent Neural Networks"Proc.of 1999 IEEE International Conference on SMC. Vol.I. 484-489 (1999)

説明

[文献書誌] Noriyasu Honma, Toshiyuki Kamauti, Kenichi Abe, and Hiroshi Takeda: "Auto-Learning by Dynamical Recognition Networks Conference on SMC"Proc.of 1999 IEEE International Conference on SMC. Vol.II. 211-216 (1999)

説明

[文献書誌] Fation Sevrani and Kenichi Abe: "On the Synthesis of Bran-State-in-a-Box Neural Models with Application to Associative Memory"Neural Computation. 12. 451-472 (2000)

説明

[文献書誌] Masao Sakai, Noriyasu Honma, Kenichi Abe: "Comlexity Control Method by Stochastic Analysis for Recurrent Neural Networks"Proc.of Fifth Int.Symp.on Artificial Life and Robotics. Vol.1. 281-284 (2000)

説明

[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, Kenichi Abe: "Labeling Q-Learning for Partially Observable Markov Decision Process Environments"Proc. of Fifth Int.Symp.on Artificial Life and Robotics. Vol.2. 484-490 (2000)

説明

[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-learning for Maze Problems with Partially Observable States"Proc.of 15 the Korea Automatic Control Conference. 484-487 (2000)

説明

[文献書誌] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Method of Chaos Dynamics In Recurrent Neural Networks"Proc.of 15 th Korea Automatic Control Conference. 494-497 (2000)

説明

[文献書誌] Hiroyuki Kamaya, Hayeon Lee, and Kenichi Abe: "Switching Q-learning in Partially Observable Markovian Environments"Proc.of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol.2. 1062-1067 (2000)

説明

[文献書誌] HaeYeon Lee, Hiroyuki Kamaya, and Kenichi Abe: "Labeling Q-Learning In Hidden State Environments"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.1. 208-211 (2001)

説明

[文献書誌] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Chaos Control by a Stochastic Analysis on Recurrent Neural Networks"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 478-481 (2001)

説明

[文献書誌] Alireza Fatehi and Kenichi Abe: "Self-organizing map neural network as a multiple model identifier for time-varying systems"Proc.of Sixth Int.Symp.on Artificial Life and Robotics. Vol.2. 528-531 (2001)

説明

[文献書誌] Masao Sakai, Noriyasu Honma, and Kenichi Abe: "Complexity Control Methods of Chaos Dynamics in Recurrent Neural Networks"Trans.of The Society of Instrument and Control Engineers. (in press).

説明

2000 年度研究成果報告書概要

阿部健一東北大学, 大学院・工学研究科, 教授 (70005403)