Speed and Fusion of Evolutionary Computation and Reinforcement Learning by Importance Sampling
Project/Area Number |
16300040
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
KOBAYASHI Shigenobu Interdisciplinary School of Science and Engineering, Professor, 大学院総合理工学研究科, 教授 (40016697)
|
Co-Investigator(Kenkyū-buntansha) |
SAKUMA Jun Interdisciplinary School of Science and Engineering, Research Associate, 大学院総合理工学研究科, 助手 (90376963)
木村 元 九州大学, 大学院・工学系研究科, 助教授 (40302963)
|
Project Period (FY) |
2004 – 2006
|
Project Status |
Completed (Fiscal Year 2006)
|
Budget Amount *help |
¥15,000,000 (Direct Cost: ¥15,000,000)
Fiscal Year 2006: ¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2005: ¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2004: ¥7,800,000 (Direct Cost: ¥7,800,000)
|
Keywords | Evolutionary Computation / Genetic Algorithms / Real-coded Genetic Algorithms / Reinforcement Learning / Importance Sampling / Instance-based Policy / Multi-objective Optimization / Hybrid Genetic Algorithms / 政策形成 / 大域的最適化 / パレート降下法 / 遺伝的アルゴリズム / Actor-Critic / 政策学習 / 政策勾配法 / 確率的傾斜法 |
Research Abstract |
Reinforcement learning handles policy search problems: searching a mapping from state to action space. However, reinforcement learning is based on gradient methods and as such, cannot deal with problems with multimodal landscape. In contrast, though Genetic Algorithms is promising to deal with them, it seems to be unsuitable for policy search from the viewpoint of the cost of evaluation. We incorporate importance sampling into the framework of genetic algorithms in order to reduce the cost of evaluation on policy search. The proposed method well applied to Markov Decision Process with multimodal landscape. Reinforcement learning is a useful tool for complex control problems that cannot be modeled mathematically nor solved theoretically. However, a traditional value function approach such as Q-learning includes the difficulty of combinatorial explosion. Direct policy search is an alternative approach that represents a policy using some model and searches a parameter space directly for an
… More
optimum by optimization techniques such as genetic algorithms. Instance-based policy is one of such policy representation models. It represents a policy using a set of instances that are pairs of state and action. We present a hybrid GA to optimize efficiently a set of instances with continuous state and continuous action, given an episodic task. The proposed method named SLIP was applied to a cat twist problem and a parallel-type double inverted pendulum problem. Experiments show the effectiveness and usefulness of SLIP. Much attention has been paid to genetic algorithms as a potent multi-objective optimization method, and the effectiveness of its hybridization with local search has recently reported. However, the existing local search methods have respective drawbacks such as high computational cost and inefficiency of improving objective function. We introduce a concept of Pareto descent directions that no other descent directions are superior in improving all objective functions. Moving solutions in such directions is expected to maximally improve all objective functions simultaneously. We propose a new local search method, Pareto desent method, which finds Pareto descent directions and moves solutions in such directions. In the case part or all of them are infeasible, it finds feasible Pareto descent directions or descent directions as necessary and moves solutions un these directions, the proposed method finds these direction by solving linear programming problems. Thus, it is computationally inexpensive. Experiments have shown that the Pareto descent method is superior to the existing methods. Less
|
Report
(4 results)
Research Products
(32 results)
-
-
-
-
-
-
[Journal Article] Saving MGG : Reducing Fitness Evaluations for Real-coded GA/MGG2006
Author(s)
Tanaka, M., Tsuchiya, H., Sakuma,.J., Ono, I., Kobayashi, S.
-
Journal Title
Journal of Japanese Society for Artificial Intelligence Vol.21, No.6
Pages: 547-555
NAID
Description
「研究成果報告書概要(欧文)」より
Related Report
-
[Journal Article] SLIP : A Sophisticated Learner for Instance-based Policy using Hybrid GA2006
Author(s)
Tsuchiya, C., Shiokawa, Y., Ikeda, K., Sakuma, J., Ono, ' I., Kobayashi, S.
-
Journal Title
Transactions of Society of Instrument and Control Engineers Vol.42, No.12
Pages: 1344-1352
NAID
Description
「研究成果報告書概要(欧文)」より
Related Report
-
[Journal Article] Local Search for Multiobjective Function optimization : Pareto Descent Method2006
Author(s)
Harada, K., Sakuma, J., Ikeda, K., Ono, I., Kobayashi, S.
-
Journal Title
Journal of Japanese Society for Artificial Intelligence Vol.21, No.4
Pages: 340-350
NAID
Description
「研究成果報告書概要(欧文)」より
Related Report
-
[Journal Article] Hybridization of Genetic Algorithm with Local Search in Multiobjective Function optimization : Recommendation of GA then LS2006
Author(s)
Harada, K., Ikeda, K., Sakuma, J., Ono, I., Kobayashi, S.
-
Journal Title
Journal of Japanese Society for Artificial Intelligence Vol.21, No.6
Pages: 482-492
NAID
Description
「研究成果報告書概要(欧文)」より
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-