• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Speed and Fusion of Evolutionary Computation and Reinforcement Learning by Importance Sampling

Research Project

Project/Area Number 16300040
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionTokyo Institute of Technology

Principal Investigator

KOBAYASHI Shigenobu  Interdisciplinary School of Science and Engineering, Professor, 大学院総合理工学研究科, 教授 (40016697)

Co-Investigator(Kenkyū-buntansha) SAKUMA Jun  Interdisciplinary School of Science and Engineering, Research Associate, 大学院総合理工学研究科, 助手 (90376963)
木村 元  九州大学, 大学院・工学系研究科, 助教授 (40302963)
Project Period (FY) 2004 – 2006
Project Status Completed (Fiscal Year 2006)
Budget Amount *help
¥15,000,000 (Direct Cost: ¥15,000,000)
Fiscal Year 2006: ¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2005: ¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2004: ¥7,800,000 (Direct Cost: ¥7,800,000)
KeywordsEvolutionary Computation / Genetic Algorithms / Real-coded Genetic Algorithms / Reinforcement Learning / Importance Sampling / Instance-based Policy / Multi-objective Optimization / Hybrid Genetic Algorithms / 政策形成 / 大域的最適化 / パレート降下法 / 遺伝的アルゴリズム / Actor-Critic / 政策学習 / 政策勾配法 / 確率的傾斜法
Research Abstract

Reinforcement learning handles policy search problems: searching a mapping from state to action space. However, reinforcement learning is based on gradient methods and as such, cannot deal with problems with multimodal landscape. In contrast, though Genetic Algorithms is promising to deal with them, it seems to be unsuitable for policy search from the viewpoint of the cost of evaluation. We incorporate importance sampling into the framework of genetic algorithms in order to reduce the cost of evaluation on policy search. The proposed method well applied to Markov Decision Process with multimodal landscape.
Reinforcement learning is a useful tool for complex control problems that cannot be modeled mathematically nor solved theoretically. However, a traditional value function approach such as Q-learning includes the difficulty of combinatorial explosion. Direct policy search is an alternative approach that represents a policy using some model and searches a parameter space directly for an … More optimum by optimization techniques such as genetic algorithms. Instance-based policy is one of such policy representation models. It represents a policy using a set of instances that are pairs of state and action. We present a hybrid GA to optimize efficiently a set of instances with continuous state and continuous action, given an episodic task. The proposed method named SLIP was applied to a cat twist problem and a parallel-type double inverted pendulum problem. Experiments show the effectiveness and usefulness of SLIP.
Much attention has been paid to genetic algorithms as a potent multi-objective optimization method, and the effectiveness of its hybridization with local search has recently reported. However, the existing local search methods have respective drawbacks such as high computational cost and inefficiency of improving objective function. We introduce a concept of Pareto descent directions that no other descent directions are superior in improving all objective functions. Moving solutions in such directions is expected to maximally improve all objective functions simultaneously. We propose a new local search method, Pareto desent method, which finds Pareto descent directions and moves solutions in such directions. In the case part or all of them are infeasible, it finds feasible Pareto descent directions or descent directions as necessary and moves solutions un these directions, the proposed method finds these direction by solving linear programming problems. Thus, it is computationally inexpensive. Experiments have shown that the Pareto descent method is superior to the existing methods. Less

Report

(4 results)
  • 2006 Annual Research Report   Final Research Report Summary
  • 2005 Annual Research Report
  • 2004 Annual Research Report
  • Research Products

    (32 results)

All 2007 2006 2005 2004

All Journal Article (32 results)

  • [Journal Article] 合理的政策形成アルゴリズムの連続値入力への拡張2007

    • Author(s)
      宮崎和, 木村元, 小林重信
    • Journal Title

      人工知能学会論文誌 22・3

      Pages: 332-341

    • NAID

      10022007639

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] ハイブリッドGAによるインスタンスベース政策学習-SLIPの提案と評価-2006

    • Author(s)
      土谷千加夫, 塩川祐介, 池田心, 佐久間淳, 小野功, 小林重信
    • Journal Title

      計測自動制御学会論文集 42・12

      Pages: 1344-1352

    • NAID

      10018422330

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Saving MGG : 実数値GA/MGGにおける適応度評価回数の削減2006

    • Author(s)
      田中雅晴, 土谷千加夫, 佐久間淳, 小野功, 小林重信
    • Journal Title

      人工知能学会論文誌 21・6

      Pages: 547-555

    • NAID

      10022006907

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] 多目的関数最適化のための局所探索 : パレー卜降下法2006

    • Author(s)
      原田健, 佐久間淳, 池田心, 小野功, 小林重信
    • Journal Title

      人工知能学会論文誌 21・4

      Pages: 340-350

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ : GA then LSの推奨2006

    • Author(s)
      原田健, 池田心, 佐久間淳, 小野功, 小林重信
    • Journal Title

      人工知能学会論文誌 21・6

      Pages: 482-492

    • NAID

      10022566669

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Saving MGG : Reducing Fitness Evaluations for Real-coded GA/MGG2006

    • Author(s)
      Tanaka, M., Tsuchiya, H., Sakuma,.J., Ono, I., Kobayashi, S.
    • Journal Title

      Journal of Japanese Society for Artificial Intelligence Vol.21, No.6

      Pages: 547-555

    • NAID

      10022006907

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] SLIP : A Sophisticated Learner for Instance-based Policy using Hybrid GA2006

    • Author(s)
      Tsuchiya, C., Shiokawa, Y., Ikeda, K., Sakuma, J., Ono, ' I., Kobayashi, S.
    • Journal Title

      Transactions of Society of Instrument and Control Engineers Vol.42, No.12

      Pages: 1344-1352

    • NAID

      10018422330

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Local Search for Multiobjective Function optimization : Pareto Descent Method2006

    • Author(s)
      Harada, K., Sakuma, J., Ikeda, K., Ono, I., Kobayashi, S.
    • Journal Title

      Journal of Japanese Society for Artificial Intelligence Vol.21, No.4

      Pages: 340-350

    • NAID

      10022006535

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Hybridization of Genetic Algorithm with Local Search in Multiobjective Function optimization : Recommendation of GA then LS2006

    • Author(s)
      Harada, K., Ikeda, K., Sakuma, J., Ono, I., Kobayashi, S.
    • Journal Title

      Journal of Japanese Society for Artificial Intelligence Vol.21, No.6

      Pages: 482-492

    • NAID

      10022006759

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] ハイブリッドGAによるインスタントスペース政策学習-SLIPの提案と評価-2006

    • Author(s)
      土谷千加夫, 塩川裕介, 池田心, 佐久間淳, 小野功, 小林重信
    • Journal Title

      計測自動制御学会論文集 42

      Pages: 1344-1352

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Saving MGG : 実数値GA/MGGにおける適応度評価回数の削減2006

    • Author(s)
      田中雅晴, 土谷千加夫, 佐久間淳, 小野功, 小林重信
    • Journal Title

      人工知能学会論文誌 21

      Pages: 547-555

    • NAID

      10022006907

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 多目的関数最適化のための局所探索 : パレート降下法2006

    • Author(s)
      原田健, 佐久間淳, 池田心, 小野功, 小林重信
    • Journal Title

      人工知能学会論文誌 21

      Pages: 340-350

    • NAID

      10022006535

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ;GA then LSの推奨2006

    • Author(s)
      原田健, 池田心, 佐久間淳, 小野功, 小林重信
    • Journal Title

      人工知能学会論文誌 21

      Pages: 482-492

    • NAID

      10022566669

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 形状可変ロボットによる凹凸地面走行の学習2006

    • Author(s)
      藤野智宏, 佐久間淳, 小野 功, 小林重信
    • Journal Title

      第18回自律分散システム・シンポジウム資料

      Pages: 105-110

    • NAID

      10022566209

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 把持・繰り動作のプランニングと強化学習2006

    • Author(s)
      石見幸樹, 佐久間淳, 小野 功, 小林重信
    • Journal Title

      第18回自律分散システム・シンポジウム資料

      Pages: 143-148

    • NAID

      10022566265

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 多目的最適化のための局所探索:パレート降下法2006

    • Author(s)
      原田 健, 佐久間淳, 小野 功, 小林重信
    • Journal Title

      第18回自律分散システム・シンポジウム資料

      Pages: 351-356

    • NAID

      10022566657

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 多目的関数最適化におけるGAと局所探索の組み合わせ:GA then LAの推奨2006

    • Author(s)
      原田 健, 佐久間淳, 小野 功, 小林重信
    • Journal Title

      第18回自律分散システム・シンポジウム資料

      Pages: 357-362

    • Related Report
      2005 Annual Research Report
  • [Journal Article] ヒトの関節軌道データと逆運動学を利用した2足歩行ロボットの学習2006

    • Author(s)
      村田栄理, 佐久間淳, 小野 功, 小林重信
    • Journal Title

      第38回システム工学部会研究会資料

      Pages: 93-98

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 実数値GAによるインスタンスベース政策の最適化2006

    • Author(s)
      土谷千加夫 佐久間淳, 小野功, 小林重信
    • Journal Title

      第33回知能システムシンポジウム資料

      Pages: 43-48

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 重点サンプリングを用いたGAによる強化学習2005

    • Author(s)
      土谷千加夫, 木村元, 佐久間淳, 小林重信
    • Journal Title

      人工知能学会論文誌 20・1

      Pages: 1-10

    • NAID

      10022004767

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] α-domination戦略に基づく分散強化学習と資源共有問題への応用2005

    • Author(s)
      青木圭, 池田心, 木村元, 小林重信
    • Journal Title

      システム制御情報学会論文誌 18・1

      Pages: 81-88

    • NAID

      10014507798

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Fusion of Soft Computing and Hard Computing for Large-scale Plants : A General Model2005

    • Author(s)
      Kamiya, A., Ovaska, S.J., Roy, S., Kobayashi, S.
    • Journal Title

      Applied Soft Computing Journal 5・3

      Pages: 265-279

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Reinforcement Learning by GA using Importance Sampling2005

    • Author(s)
      Tsuchiya, C., Kimura, H., Sakuma, J., Kobayashi, S.
    • Journal Title

      Journal of Japanese Society for Artificial Intelligence Vol.20, No.1

      Pages: 1-10

    • NAID

      10022004767

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Distributed Reinforcement Learning based on a-domination Strategy and its Application to Shared Resource Problems2005

    • Author(s)
      Aoki, K, Ikeda, K., Kimura, H., Kobayashi, S.
    • Journal Title

      Journal of Institute of Systems, Control and Information Engineers Vol.18, No.3

      Pages: 81-88

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] An Extension of the Rational policy Making Algorithm to Continuous State Spaces2005

    • Author(s)
      Miyazaki, K., Kimura, H., Kobayashi, S.
    • Journal Title

      Journal of Japanese Society for Artificial Intelligence Vol.22, No.3

      Pages: 332-341

    • NAID

      10022007639

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Fusion of Soft Computing and Hard Computing. for Large-scale Plants : A General Model2005

    • Author(s)
      Kamiya, A., Ovaska, S.J., Roy, S., Kobayashi, S.
    • Journal Title

      Applied Soft Computing Journal Vol.5, No.3

      Pages: 265-279

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] 重点サンプリングを用いたGAによる強化学習2005

    • Author(s)
      土谷千加夫, 木村元, 佐久間淳, 小林重信
    • Journal Title

      人工知能学会論文誌 20巻・1A号

      Pages: 1-10

    • NAID

      10022004767

    • Related Report
      2004 Annual Research Report
  • [Journal Article] α-domination戦略に基づく分散強化学習と資源共有問題への応用2005

    • Author(s)
      青木圭, 池田心, 木村元, 小林重信
    • Journal Title

      システム制御情報学会論文誌 18巻・3号

      Pages: 81-88

    • NAID

      10014507798

    • Related Report
      2004 Annual Research Report
  • [Journal Article] 重点サンプリングを用いた政策勾配の推定による子個体生成2005

    • Author(s)
      土谷千加夫, 木村元, 小林重信
    • Journal Title

      SICE第31回知能システムシンポジウム資料

      Pages: 145-150

    • Related Report
      2004 Annual Research Report
  • [Journal Article] GAによる政策探索における政策表現と学習効率2005

    • Author(s)
      土谷千加夫, 佐久間淳, 小林重信
    • Journal Title

      SICE第17回自律分散シンポジウム資料

      Pages: 289-294

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Distributed Reinforcement learning using Bi-directional Decision Making for Multi-criteria Control of Multi-Staae Flow Systems2004

    • Author(s)
      K.Aoki, H.Kimura, S.Kobayashi
    • Journal Title

      Proc.Of 8^<th> Conf.on Intelligent Autonomous Systems

      Pages: 281-290

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Policy Learning by GA using Importance Sampling2004

    • Author(s)
      T.Tsuchiya, H.Kimura, S.Kobayashi
    • Journal Title

      Proc.Of 8^<th> Conf.on Intelligent Autonomous Systems

      Pages: 385-394

    • Related Report
      2004 Annual Research Report

URL: 

Published: 2004-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi