Speed and Fusion of Evolutionary Computation and Reinforcement Learning by Importance Sampling

Research Project

Project/Area Number	16300040
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Tokyo Institute of Technology
Principal Investigator	KOBAYASHI Shigenobu Interdisciplinary School of Science and Engineering, Professor, 大学院総合理工学研究科, 教授 (40016697)
Co-Investigator(Kenkyū-buntansha)	SAKUMA Jun Interdisciplinary School of Science and Engineering, Research Associate, 大学院総合理工学研究科, 助手 (90376963) 木村元九州大学, 大学院・工学系研究科, 助教授 (40302963)
Project Period (FY)	2004 – 2006
Project Status	Completed (Fiscal Year 2006)
Budget Amount *help	¥15,000,000 (Direct Cost: ¥15,000,000) Fiscal Year 2006: ¥3,600,000 (Direct Cost: ¥3,600,000) Fiscal Year 2005: ¥3,600,000 (Direct Cost: ¥3,600,000) Fiscal Year 2004: ¥7,800,000 (Direct Cost: ¥7,800,000)
Keywords	Evolutionary Computation / Genetic Algorithms / Real-coded Genetic Algorithms / Reinforcement Learning / Importance Sampling / Instance-based Policy / Multi-objective Optimization / Hybrid Genetic Algorithms / 政策形成 / 大域的最適化 / パレート降下法 / 遺伝的アルゴリズム / Actor-Critic / 政策学習 / 政策勾配法 / 確率的傾斜法
Research Abstract	Reinforcement learning handles policy search problems: searching a mapping from state to action space. However, reinforcement learning is based on gradient methods and as such, cannot deal with problems with multimodal landscape. In contrast, though Genetic Algorithms is promising to deal with them, it seems to be unsuitable for policy search from the viewpoint of the cost of evaluation. We incorporate importance sampling into the framework of genetic algorithms in order to reduce the cost of evaluation on policy search. The proposed method well applied to Markov Decision Process with multimodal landscape. Reinforcement learning is a useful tool for complex control problems that cannot be modeled mathematically nor solved theoretically. However, a traditional value function approach such as Q-learning includes the difficulty of combinatorial explosion. Direct policy search is an alternative approach that represents a policy using some model and searches a parameter space directly for an … More optimum by optimization techniques such as genetic algorithms. Instance-based policy is one of such policy representation models. It represents a policy using a set of instances that are pairs of state and action. We present a hybrid GA to optimize efficiently a set of instances with continuous state and continuous action, given an episodic task. The proposed method named SLIP was applied to a cat twist problem and a parallel-type double inverted pendulum problem. Experiments show the effectiveness and usefulness of SLIP. Much attention has been paid to genetic algorithms as a potent multi-objective optimization method, and the effectiveness of its hybridization with local search has recently reported. However, the existing local search methods have respective drawbacks such as high computational cost and inefficiency of improving objective function. We introduce a concept of Pareto descent directions that no other descent directions are superior in improving all objective functions. Moving solutions in such directions is expected to maximally improve all objective functions simultaneously. We propose a new local search method, Pareto desent method, which finds Pareto descent directions and moves solutions in such directions. In the case part or all of them are infeasible, it finds feasible Pareto descent directions or descent directions as necessary and moves solutions un these directions, the proposed method finds these direction by solving linear programming problems. Thus, it is computationally inexpensive. Experiments have shown that the Pareto descent method is superior to the existing methods. Less

Report

(4 results)

2006 Annual Research Report Final Research Report Summary
2005 Annual Research Report
2004 Annual Research Report

Research Products
(32 results)

All 2007 2006 2005 2004

All Journal Article (32 results)

[Journal Article] 合理的政策形成アルゴリズムの連続値入力への拡張2007
- Author(s)
  宮崎和, 木村元, 小林重信
- Journal Title
  
  人工知能学会論文誌 22・3
  
  Pages: 332-341
- NAID
  10022007639
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] ハイブリッドGAによるインスタンスベース政策学習-SLIPの提案と評価-2006
- Author(s)
  土谷千加夫, 塩川祐介, 池田心, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  計測自動制御学会論文集 42・12
  
  Pages: 1344-1352
- NAID
  10018422330
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Saving MGG : 実数値GA/MGGにおける適応度評価回数の削減2006
- Author(s)
  田中雅晴, 土谷千加夫, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  人工知能学会論文誌 21・6
  
  Pages: 547-555
- NAID
  10022006907
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 多目的関数最適化のための局所探索 : パレー卜降下法2006
- Author(s)
  原田健, 佐久間淳, 池田心, 小野功, 小林重信
- Journal Title
  
  人工知能学会論文誌 21・4
  
  Pages: 340-350
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ : GA then LSの推奨2006
- Author(s)
  原田健, 池田心, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  人工知能学会論文誌 21・6
  
  Pages: 482-492
- NAID
  10022566669
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Saving MGG : Reducing Fitness Evaluations for Real-coded GA/MGG2006
- Author(s)
  Tanaka, M., Tsuchiya, H., Sakuma,.J., Ono, I., Kobayashi, S.
- Journal Title
  
  Journal of Japanese Society for Artificial Intelligence Vol.21, No.6
  
  Pages: 547-555
- NAID
  10022006907
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] SLIP : A Sophisticated Learner for Instance-based Policy using Hybrid GA2006
- Author(s)
  Tsuchiya, C., Shiokawa, Y., Ikeda, K., Sakuma, J., Ono, ' I., Kobayashi, S.
- Journal Title
  
  Transactions of Society of Instrument and Control Engineers Vol.42, No.12
  
  Pages: 1344-1352
- NAID
  10018422330
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Local Search for Multiobjective Function optimization : Pareto Descent Method2006
- Author(s)
  Harada, K., Sakuma, J., Ikeda, K., Ono, I., Kobayashi, S.
- Journal Title
  
  Journal of Japanese Society for Artificial Intelligence Vol.21, No.4
  
  Pages: 340-350
- NAID
  10022006535
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Hybridization of Genetic Algorithm with Local Search in Multiobjective Function optimization : Recommendation of GA then LS2006
- Author(s)
  Harada, K., Ikeda, K., Sakuma, J., Ono, I., Kobayashi, S.
- Journal Title
  
  Journal of Japanese Society for Artificial Intelligence Vol.21, No.6
  
  Pages: 482-492
- NAID
  10022006759
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] ハイブリッドGAによるインスタントスペース政策学習-SLIPの提案と評価-2006
- Author(s)
  土谷千加夫, 塩川裕介, 池田心, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  計測自動制御学会論文集 42
  
  Pages: 1344-1352
- Related Report
  2006 Annual Research Report
[Journal Article] Saving MGG : 実数値GA/MGGにおける適応度評価回数の削減2006
- Author(s)
  田中雅晴, 土谷千加夫, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  人工知能学会論文誌 21
  
  Pages: 547-555
- NAID
  10022006907
- Related Report
  2006 Annual Research Report
[Journal Article] 多目的関数最適化のための局所探索 : パレート降下法2006
- Author(s)
  原田健, 佐久間淳, 池田心, 小野功, 小林重信
- Journal Title
  
  人工知能学会論文誌 21
  
  Pages: 340-350
- NAID
  10022006535
- Related Report
  2006 Annual Research Report
[Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ;GA then LSの推奨2006
- Author(s)
  原田健, 池田心, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  人工知能学会論文誌 21
  
  Pages: 482-492
- NAID
  10022566669
- Related Report
  2006 Annual Research Report
[Journal Article] 形状可変ロボットによる凹凸地面走行の学習2006
- Author(s)
  藤野智宏, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  第18回自律分散システム・シンポジウム資料
  
  Pages: 105-110
- NAID
  10022566209
- Related Report
  2005 Annual Research Report
[Journal Article] 把持・繰り動作のプランニングと強化学習2006
- Author(s)
  石見幸樹, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  第18回自律分散システム・シンポジウム資料
  
  Pages: 143-148
- NAID
  10022566265
- Related Report
  2005 Annual Research Report
[Journal Article] 多目的最適化のための局所探索:パレート降下法2006
- Author(s)
  原田健, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  第18回自律分散システム・シンポジウム資料
  
  Pages: 351-356
- NAID
  10022566657
- Related Report
  2005 Annual Research Report
[Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ:GA then LAの推奨2006
- Author(s)
  原田健, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  第18回自律分散システム・シンポジウム資料
  
  Pages: 357-362
- Related Report
  2005 Annual Research Report
[Journal Article] ヒトの関節軌道データと逆運動学を利用した2足歩行ロボットの学習2006
- Author(s)
  村田栄理, 佐久間淳, 小野功, 小林重信
- Journal Title
  
  第38回システム工学部会研究会資料
  
  Pages: 93-98
- Related Report
  2005 Annual Research Report
[Journal Article] 実数値GAによるインスタンスベース政策の最適化2006
- Author(s)
  土谷千加夫佐久間淳, 小野功, 小林重信
- Journal Title
  
  第33回知能システムシンポジウム資料
  
  Pages: 43-48
- Related Report
  2005 Annual Research Report
[Journal Article] 重点サンプリングを用いたGAによる強化学習2005
- Author(s)
  土谷千加夫, 木村元, 佐久間淳, 小林重信
- Journal Title
  
  人工知能学会論文誌 20・1
  
  Pages: 1-10
- NAID
  10022004767
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] α-domination戦略に基づく分散強化学習と資源共有問題への応用2005
- Author(s)
  青木圭, 池田心, 木村元, 小林重信
- Journal Title
  
  システム制御情報学会論文誌 18・1
  
  Pages: 81-88
- NAID
  10014507798
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Fusion of Soft Computing and Hard Computing for Large-scale Plants : A General Model2005
- Author(s)
  Kamiya, A., Ovaska, S.J., Roy, S., Kobayashi, S.
- Journal Title
  
  Applied Soft Computing Journal 5・3
  
  Pages: 265-279
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Reinforcement Learning by GA using Importance Sampling2005
- Author(s)
  Tsuchiya, C., Kimura, H., Sakuma, J., Kobayashi, S.
- Journal Title
  
  Journal of Japanese Society for Artificial Intelligence Vol.20, No.1
  
  Pages: 1-10
- NAID
  10022004767
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Distributed Reinforcement Learning based on a-domination Strategy and its Application to Shared Resource Problems2005
- Author(s)
  Aoki, K, Ikeda, K., Kimura, H., Kobayashi, S.
- Journal Title
  
  Journal of Institute of Systems, Control and Information Engineers Vol.18, No.3
  
  Pages: 81-88
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] An Extension of the Rational policy Making Algorithm to Continuous State Spaces2005
- Author(s)
  Miyazaki, K., Kimura, H., Kobayashi, S.
- Journal Title
  
  Journal of Japanese Society for Artificial Intelligence Vol.22, No.3
  
  Pages: 332-341
- NAID
  10022007639
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Fusion of Soft Computing and Hard Computing. for Large-scale Plants : A General Model2005
- Author(s)
  Kamiya, A., Ovaska, S.J., Roy, S., Kobayashi, S.
- Journal Title
  
  Applied Soft Computing Journal Vol.5, No.3
  
  Pages: 265-279
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 重点サンプリングを用いたGAによる強化学習2005
- Author(s)
  土谷千加夫, 木村元, 佐久間淳, 小林重信
- Journal Title
  
  人工知能学会論文誌 20巻・1A号
  
  Pages: 1-10
- NAID
  10022004767
- Related Report
  2004 Annual Research Report
[Journal Article] α-domination戦略に基づく分散強化学習と資源共有問題への応用2005
- Author(s)
  青木圭, 池田心, 木村元, 小林重信
- Journal Title
  
  システム制御情報学会論文誌 18巻・3号
  
  Pages: 81-88
- NAID
  10014507798
- Related Report
  2004 Annual Research Report
[Journal Article] 重点サンプリングを用いた政策勾配の推定による子個体生成2005
- Author(s)
  土谷千加夫, 木村元, 小林重信
- Journal Title
  
  SICE第31回知能システムシンポジウム資料
  
  Pages: 145-150
- Related Report
  2004 Annual Research Report
[Journal Article] GAによる政策探索における政策表現と学習効率2005
- Author(s)
  土谷千加夫, 佐久間淳, 小林重信
- Journal Title
  
  SICE第17回自律分散シンポジウム資料
  
  Pages: 289-294
- Related Report
  2004 Annual Research Report
[Journal Article] Distributed Reinforcement learning using Bi-directional Decision Making for Multi-criteria Control of Multi-Staae Flow Systems2004
- Author(s)
  K.Aoki, H.Kimura, S.Kobayashi
- Journal Title
  
  Proc.Of 8^<th> Conf.on Intelligent Autonomous Systems
  
  Pages: 281-290
- Related Report
  2004 Annual Research Report
[Journal Article] Policy Learning by GA using Importance Sampling2004
- Author(s)
  T.Tsuchiya, H.Kimura, S.Kobayashi
- Journal Title
  
  Proc.Of 8^<th> Conf.on Intelligent Autonomous Systems
  
  Pages: 385-394
- Related Report
  2004 Annual Research Report

Speed and Fusion of Evolutionary Computation and Reinforcement Learning by Importance Sampling

Principal Investigator

KOBAYASHI Shigenobu Interdisciplinary School of Science and Engineering, Professor, 大学院総合理工学研究科, 教授 (40016697)

¥15,000,000 (Direct Cost: ¥15,000,000)

Report

Research Products

[Journal Article] 合理的政策形成アルゴリズムの連続値入力への拡張2007

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] ハイブリッドGAによるインスタンスベース政策学習-SLIPの提案と評価-2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Saving MGG : 実数値GA/MGGにおける適応度評価回数の削減2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] 多目的関数最適化のための局所探索 : パレー卜降下法2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ : GA then LSの推奨2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Saving MGG : Reducing Fitness Evaluations for Real-coded GA/MGG2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] SLIP : A Sophisticated Learner for Instance-based Policy using Hybrid GA2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Local Search for Multiobjective Function optimization : Pareto Descent Method2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Hybridization of Genetic Algorithm with Local Search in Multiobjective Function optimization : Recommendation of GA then LS2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] ハイブリッドGAによるインスタントスペース政策学習-SLIPの提案と評価-2006

Author(s)

Journal Title

Related Report

[Journal Article] Saving MGG : 実数値GA/MGGにおける適応度評価回数の削減2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 多目的関数最適化のための局所探索 : パレート降下法2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ;GA then LSの推奨2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 形状可変ロボットによる凹凸地面走行の学習2006

Author(s)