• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Reinforcement learning method for environment with actuators that can be modeled with first-order lag elements or dead time elements

Research Project

Project/Area Number 18K11424
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 61030:Intelligent informatics-related
Research InstitutionUniversity of Tsukuba

Principal Investigator

Shibuya Takeshi  筑波大学, システム情報系, 助教 (90582776)

Project Period (FY) 2018-04-01 – 2021-03-31
Project Status Completed (Fiscal Year 2020)
Budget Amount *help
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2020: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2019: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2018: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
Keywords機械学習 / 強化学習 / 一次遅れ要素 / むだ時間要素
Outline of Final Research Achievements

In this study, the compensator was designed by the following three methods. The first method is to design the compensator to reduce the difference in the successive state caused by the presence or absence of the first-order lag element and the dead time. The second method is to design the compensator to reduce the difference in the output of the first-order lag element caused by the presence or absence of the first-order lag element. The third method is to design the extended state for the first-order lag element by a low-dimensional representation using the characteristics of the first-order lag. Numerical simulations using a two-link manipulator or an inverted pendulum were performed to confirm its effectiveness. Lastly, we studied reinforcement learning method which switches control strategy adaptively for environment conditions.

Academic Significance and Societal Importance of the Research Achievements

本研究の成果は大きく2つの学術的意義を有する。本研究の意義の1つ目は、補償器をあとから追加する方式をとる場合でもそれらの再学習を不要にできる点である。一次遅れ要素やむだ時間要素を含まない環境で学習を行い、あとからこれらを追加した環境で学習しようとする場合に生じる再学習を避けることができる。また、2つ目は、一次遅れ要素やむだ時間要素の出力値に関する情報を直接的には利用しないため、環境の情報を新たにセンシングする必要もない点である。この性質により、環境から見込んだ先を不変のものとして扱うことができる。

Report

(4 results)
  • 2020 Annual Research Report   Final Research Report ( PDF )
  • 2019 Research-status Report
  • 2018 Research-status Report
  • Research Products

    (3 results)

All 2021 2019 2018

All Journal Article (1 results) (of which Peer Reviewed: 1 results,  Open Access: 1 results) Presentation (2 results) (of which Int'l Joint Research: 1 results)

  • [Journal Article] AdaptiveModularReinforcemen tLearning for Robot Controlled in Multiple Environments2021

    • Author(s)
      Teppei Iwata, Takeshi Shibuya
    • Journal Title

      IEEE Access

      Volume: -

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access
  • [Presentation] 行動出力に大きな一時遅れを持つ環境における強化学習のための補償器の設計2019

    • Author(s)
      小林翔樹、澁谷長史
    • Organizer
      第76回知的システム研究会(SIC2019-2)論文集
    • Related Report
      2019 Research-status Report
  • [Presentation] Reinforcement Learning Method for Cases Where the State Observation Period Is Larger Than the Action Decision Period2018

    • Author(s)
      Masaki Yotsukura, Takeshi Shibuya
    • Organizer
      Proceedings of the SICE Annual Conference 2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research

URL: 

Published: 2018-04-23   Modified: 2022-01-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi