2022 Fiscal Year Research-status Report

Research on the innovative evolution of deep reinforcement learning based on the profit sharing principle and its application to real problems

Research Project

Project/Area Number	21K12024
Research Institution	National Institution for Academic Degrees and Quality Enhancement of Higher Education
Principal Investigator	宮崎和光独立行政法人大学改革支援・学位授与機構, 研究開発部, 教授 (20282866)
Co-Investigator(Kenkyū-buntansha)	山口周独立行政法人大学改革支援・学位授与機構, 研究開発部, 特任教授 (10182437) 原田拓東京理科大学, 理工学部経営工学科, 准教授 (70256668) 小玉直樹明治大学, 理工学部, 助教 (60908747)
Project Period (FY)	2021-04-01 – 2024-03-31
Keywords	深層強化学習 / 利益分配原理 / 深層経験強化型学習 / スマートエネルギーシステム / 信号機制御 / ツイートデータ / ロボット制御
Outline of Annual Research Achievements	当該年度においては、これまでに提案してきた利益分配原理（PS原理）に基づく手法を利用した「応用例の探求」を中心に研究を進めた。学術論文「Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis on Reinforcing Successful Experiences」では、信号機制御を題材に、PS原理に基づく手法であるDual Targeting Algorithm（DTA）の有効性を確認した。特に、これまで明らかでなかったDTAのマルチエージェント環境下での有効性を確認できた意義が大きく、応用例探求に関する重要な成果と言える。さらに「Proposal and Evaluation of a Course-Classification-Support System Emphasizing Communication with the Sub-committees Within the Committee of Validation and Examination for Degrees」では、本研究課題で応用例として掲げるカリキュラム分析支援システムの要となる「科目分類支援システム」の研究開発を進めた。加えて、口頭発表「マルチエージェント環境下における強化学習を用いたネガティブツイートの抑制」では、マルチエージェント環境下での間接報酬に関する定理の検証を行い、PS原理に基づく手法が、他手法よりも、ネガティブなツイートを抑制できることを示した。以上より、当該年度では「応用例の探求」を中心に研究を進めるとともに、副目標のひとつである「マルチエージェント環境下での間接報酬との関係を整理し、マルチエージェント環境下でのPS原理の有効性を明らかにする」ことに寄与する成果を得た。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本研究課題では、前年度において、本研究課題の主目標である「ばらつきを抑えた経験強化型学習手法」を提案している。それに対し、当該年度では、前年度の成果を踏まえた「応用例の探求」と「副目標の達成」に注力した。まず、「応用例の探求」としては、「信号機制御」および「ツイートデータ」を題材とした「マルチエージェント環境下での応用例」の探求を進めた。数値実験を通じて、両応用例ともに、PS原理に基づく手法が、他の手法に比べ、有効に機能することを示した。加えて、本研究課題で当初から計画していた「カリキュラム分析システム」の主たる要素である「科目分類支援システム」の研究開発を進めた。一方、「副目標の達成」としては、ふたつある副目標のうちのひとつである「マルチエージェント環境下での間接報酬との関係を整理し、マルチエージェント環境下でのPS原理の有効性を明らかにする」ことに注力した。特に、「ツイートデータ」を題材とした研究においては、マルチエージェント環境下での間接報酬に関する定理について、実際のツイートデータを用いた検証を行った。以上の成果から、本研究課題は「おおむね順調に進展している」と判断した。
Strategy for Future Research Activity	当該年度では、PS原理に基づく手法を用いた「応用例の探求」を進めたが、今後は、これまで行ってきた応用例のさらなる発展を考えている。例えば、信号機制御に関しては、より現実の環境に近い複雑なマルチエージェント環境下での学習を検討している。また、ツイートデータを題材とした研究においては、現時点では、深層学習との組み合わせは実現されていない。そこで、ツイートの類型化部分に深層学習を導入することで、PS原理に基づく「深層経験強化型学習(DeePS)」としての有効性の検証を考えている。一方、「科目分類支援システム」の研究開発では、当該年度では、深層学習手法としての検証を行ったのみであり、強化学習や経験強化型学習との組み合わせは実現されていない。そこで今後は、「科目分類支援システム」にDeePSを組み合わせることで、より有効な支援システムの構成を検討する予定である。本研究課題における「副目標の達成」については、ふたつある副目標のうちのひとつである「マルチ―エージェント環境下での間接報酬との関係を整理し、マルチエージェント環境下でのPS原理の有効性を明らかにする」ことに関する成果を得たが、もうひとつの副目標である「PS原理と適格度トレースとの関係を整理し、MDPs（マルコフ決定過程）を超えるクラスでの有効性を明らかにする」ことに関する成果は得られていない。そこで、残りの研究期間では、後者の副目標の達成に注力し、本研究課題のとりまとめを行う予定である。
Causes of Carryover	次年度に、本研究課題で副目標として掲げている「PS原理と適格度トレースとの関係を整理し、MDPs（マルコフ決定過程）を超えるクラスでの有効性を明らかにする」ことを達成するために、MDPsを超えるクラスである実環境下で動き回るロボットの学習を計画している。ロボットの購入は、次年度（最終年度）に確保していた予算のみでは不足することが判明したため、そのための予算を当該年度からの繰り越しで確保する必要が生じ、次年度使用額が発生した。

Research Products
(13 results)

All 2023 2022

All Journal Article (4 results) (of which Peer Reviewed: 4 results, Open Access: 1 results) Presentation (9 results) (of which Int'l Joint Research: 4 results)

[Journal Article] Proposal and Evaluation of a Course-Classification-Support System Emphasizing Communication with the Sub-committees Within the Committee of Validation and Examination for Degrees2023
- Author(s)
  Miyazaki Kazuteru、Yamaguchi Syu、Mori Rie、Yoshikawa Yumiko、Saito Takanori、Suzuki Toshiya
- Journal Title
  
  Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
  
  Volume: 477 Pages: 123～130
- DOI
  10.1007/978-3-031-29126-5_10
- Peer Reviewed
[Journal Article] Surface Hydroxyl-Ion Diffusion and Hierarchical Structure of Adsorbed Water on Hydrated Layered Double Hydroxides2023
- Author(s)
  Yamasaki Tomoyuki、Iimura Soshi、Hosono Hideo、Yamaguchi Shu
- Journal Title
  
  The Journal of Physical Chemistry C
  
  Volume: 127 Pages: 6045～6053
- DOI
  10.1021/acs.jpcc.3c00275
- Peer Reviewed
[Journal Article] Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis on Reinforcing Successful Experiences2022
- Author(s)
  Kodama Naoki、Harada Taku、Miyazaki Kazuteru
- Journal Title
  
  IEEE Access
  
  Volume: 10 Pages: 128943～128950
- DOI
  10.1109/access.2022.3225431
- Peer Reviewed / Open Access
[Journal Article] Modeling of placebo effect in stochastic reward tasks by reinforcement learning2022
- Author(s)
  Miyazaki Kazuteru
- Journal Title
  
  Procedia Computer Science
  
  Volume: 213 Pages: 255～262
- DOI
  10.1016/j.procs.2022.11.064
- Peer Reviewed
[Presentation] マルチエージェント環境下における強化学習を用いたネガティブツイートの抑制2023
- Author(s)
  宮崎和光
- Organizer
  第50回知能システムシンポジウム
[Presentation] Effectiveness of Character-level CNN and its Examination of Perturbation for Weights2023
- Author(s)
  Miyazaki Kazuteru、Ida Masaaki
- Organizer
  28th International Symposium on Artificial Life and Robotics (AROB 28th 2023)
- Int'l Joint Research
[Presentation] Learning Thresholds to Select Cooperative Partners by Applying Deep Reinforcement Learning in Distributed Traffic Signal Control2023
- Author(s)
  Matsuta Shinya、Kodama Naoki、Harada Taku
- Organizer
  38th International Conference on Computers and Their Applications
- Int'l Joint Research
[Presentation] Distributed Traffic Signal Control with Fairness Using Deep Reinforcement Learning2023
- Author(s)
  Shirasaka Shogo、Kodama Naoki、Harada Taku
- Organizer
  SICE International Symposium on Control Systems 2023
- Int'l Joint Research
[Presentation] 強化学習を用いたネガティブツイートの抑制2022
- Author(s)
  宮崎和光
- Organizer
  計測自動制御学会システム・情報部門学術講演会2022
[Presentation] 経験強化型深層強化学習による Atari2600 シミュレーション2022
- Author(s)
  小玉直樹、原田拓、宮崎和光
- Organizer
  計測自動制御学会システム・情報部門学術講演会2022
[Presentation] 説明可能な深層強化学習法の提案2022
- Author(s)
  小玉直樹、原田拓、宮崎和光
- Organizer
  電気学会C部門大会
[Presentation] 深層学習を利用したBioDOS にとって有用な論文の発見2022
- Author(s)
  宮崎和光、木賀大介、安田翔也、濱田立輝、小玉直樹、山村雅幸
- Organizer
  電気学会C部門大会
[Presentation] Rule-based generation of synthetic genetic circuits2022
- Author(s)
  Kiga Daisuke、Miyazaki Kazuteru、Yasuda Shoya、Hamada Ritsuki、Okuda Sota、Sekine Ryoji、Kodama Naoki、Yamamura Masayuki
- Organizer
  14th International Workshop on Bio-Design Automation (IWBDA 2022)
- Int'l Joint Research

2022 Fiscal Year Research-status Report

Research on the innovative evolution of deep reinforcement learning based on the profit sharing principle and its application to real problems

Principal Investigator

宮崎 和光 独立行政法人大学改革支援・学位授与機構, 研究開発部, 教授 (20282866)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Proposal and Evaluation of a Course-Classification-Support System Emphasizing Communication with the Sub-committees Within the Committee of Validation and Examination for Degrees2023

Author(s)

Journal Title

DOI

[Journal Article] Surface Hydroxyl-Ion Diffusion and Hierarchical Structure of Adsorbed Water on Hydrated Layered Double Hydroxides2023

Author(s)

Journal Title

DOI

[Journal Article] Traffic Signal Control System Using Deep Reinforcement Learning With Emphasis on Reinforcing Successful Experiences2022

Author(s)

Journal Title

DOI

[Journal Article] Modeling of placebo effect in stochastic reward tasks by reinforcement learning2022

Author(s)

Journal Title

DOI

[Presentation] マルチエージェント環境下における強化学習を用いたネガティブツイートの抑制2023

Author(s)

Organizer

[Presentation] Effectiveness of Character-level CNN and its Examination of Perturbation for Weights2023

Author(s)

Organizer

[Presentation] Learning Thresholds to Select Cooperative Partners by Applying Deep Reinforcement Learning in Distributed Traffic Signal Control2023

Author(s)

Organizer

[Presentation] Distributed Traffic Signal Control with Fairness Using Deep Reinforcement Learning2023

Author(s)

Organizer

[Presentation] 強化学習を用いたネガティブツイートの抑制2022

Author(s)

Organizer

[Presentation] 経験強化型深層強化学習による Atari2600 シミュレーション2022

Author(s)

Organizer

[Presentation] 説明可能な深層強化学習法の提案2022

Author(s)

Organizer

[Presentation] 深層学習を利用したBioDOS にとって有用な論文の発見2022

Author(s)

Organizer

[Presentation] Rule-based generation of synthetic genetic circuits2022

Author(s)

Organizer

宮崎和光独立行政法人大学改革支援・学位授与機構, 研究開発部, 教授 (20282866)