2020 Fiscal Year Final Research Report

Reinforcement learning method for environment with actuators that can be modeled with first-order lag elements or dead time elements

Research Project

PDF

Project/Area Number	18K11424
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	University of Tsukuba
Principal Investigator	Shibuya Takeshi 筑波大学, システム情報系, 助教 (90582776)
Project Period (FY)	2018-04-01 – 2021-03-31
Keywords	機械学習 / 強化学習
Outline of Final Research Achievements	In this study, the compensator was designed by the following three methods. The first method is to design the compensator to reduce the difference in the successive state caused by the presence or absence of the first-order lag element and the dead time. The second method is to design the compensator to reduce the difference in the output of the first-order lag element caused by the presence or absence of the first-order lag element. The third method is to design the extended state for the first-order lag element by a low-dimensional representation using the characteristics of the first-order lag. Numerical simulations using a two-link manipulator or an inverted pendulum were performed to confirm its effectiveness. Lastly, we studied reinforcement learning method which switches control strategy adaptively for environment conditions.
Free Research Field	機械学習
Academic Significance and Societal Importance of the Research Achievements	本研究の成果は大きく２つの学術的意義を有する。本研究の意義の1つ目は、補償器をあとから追加する方式をとる場合でもそれらの再学習を不要にできる点である。一次遅れ要素やむだ時間要素を含まない環境で学習を行い、あとからこれらを追加した環境で学習しようとする場合に生じる再学習を避けることができる。また、2つ目は、一次遅れ要素やむだ時間要素の出力値に関する情報を直接的には利用しないため、環境の情報を新たにセンシングする必要もない点である。この性質により、環境から見込んだ先を不変のものとして扱うことができる。