2018 Fiscal Year Annual Research Report

通信無し強化学習エージェント群による動的環境への追従

Research Project

Project/Area Number	17J08724
Research Institution	The University of Electro-Communications
Principal Investigator	上野史電気通信大学, 情報理工学研究科, 特別研究員(DC1)
Project Period (FY)	2017-04-26 – 2020-03-31
Keywords	マルチエージェントシステム / 強化学習 / 動的変化 / 報酬
Outline of Annual Research Achievements	マルチエージェント強化学習(Multi-Agent Reinforcement Learning: MARL)はロボットのような観測した状態に対し適切に振舞う複数の主体（エージェント）が協調的な振舞いを学習し，困難な課題を解決する手法です．しかしながら実用環境では協調的振舞いは変化するため，MARLによる追従は困難です．本研究は，MARLの実環境適用範囲の拡大のための基盤技術確立を目指し，3年間で1，動的変化に追従する協調行動学習法，2，協調行動学習の理論的補強，3，実問題への適用の3つのテーマに取り組みます．平成30年度ではテーマ1，2に取り組み，主に(1)エージェント数，(2)目的状態及び目的数，(3)報酬値3種類の動的変化に追従可能な非通信協調行動学習法の提案及び理論的補強を行いました．また，テーマ3についても(3)実問題解決に向けた不正確なデータを用いた学習法を考案しました．特に本年度は理論的補強に主眼を置き，各提案手法における最適性とそのための条件，そして適用限界を理論的に示しました．加えて(3)については複数の機械学習法を取り入れ，実問題に向けた不正確な情報しか得られない環境における適切な学習法を考案する等，理論を主眼に置きつつMARLを展開し，今後に向けた準備を着々と進めております．課題(1)の成果は国際会議PRIMA2018にて発表しました．また，課題(2)の成果は，(1)のものと合わせて国際会議ECML PKDD2019に投稿中であり，英文ジャーナルJCMSIに現在条件付きで採録が決定しております．また，課題(3)の成果は国内学会SSI2018にてポスター発表を行い，国際ジャーナルMachine Learningへ現在投稿中です．そして課題(4)の成果は国際会議GECCO2018にて発表を行うなど，対外的に高い評価を受けています．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本年度は2年目の研究計画である協調行動学習の理論的補強を主軸にエージェント数の動的変化，目的状態と目的数の動的変化，報酬値の動的変化に適応可能な学習法の提案及び理論的補強と，3年目の課題に向けた不正確な環境情報に適応可能な学習法の提案を行い，その効果と適用範囲を示した．具体的にはエージェント数の動的変化に追従するため，エージェントの達成目的を制限する手法を提案し，その有効性を実験により示した．そして各エージェントが達成すべき目的の数を調節することで任意数エージェントであっても協調行動を学習可能であることを理論的に示した．また，目的状態と目的数の動的変化に追従するために，エージェントの獲得報酬値の合計を目的達成回数で割った期待値を提案し，実験によりその有効性を示した．期待値は目的状態や目的数が変化すれば変化するので，その変化に追従し学習することで目的状態や目的数に追従して協調行動が非通信に学習できる．そして，報酬値の動的変化に対しては，報酬値と最短ステップ数とを統合した新たな指標を提案し，実験によりその有効性を検証した．これは報酬値から行動の価値を推定しその価値が等しくなる時，初期状態から何ステップ経ったときの行動であるかを疑似的に計算したものであり，これにより報酬値が動的変化したとしてもそれに追従して協調行動を学習可能となる．更に理論的分析により，指標は報酬値及びステップ数の数値の大きさに依存せずに評価可能であることが示されている．最後に，実問題における不正確な環境情報から適切な学習が行える方法を提案した．この知見により，知識構造はif-thenルールで利用する方が不正確データにおける頑健性があり，更に木構造による学習を用いて事前に実問題の情報を補完することで，不正確なデータでも本提案手法が適用可能であることを示しました．以上から本年度の研究計画は十分達成されたと言えます．
Strategy for Future Research Activity	今後は3年間で取り組むテーマである1，動的変化に追従する協調行動学習法，2，協調行動学習の理論的補強，3，実問題への適用の内，主にテーマ3に取り組みます．具体的には，平成30年度に取り組んだ動的変化が複合した上で不測の事態（エージェントの故障）が起こる環境に追従する協調行動学習法を提案し，その理論的補強を行います．また，3年目の計画に従い，実問題である災害地物資運搬問題の高度なシミュレーション環境を構築し，提案手法の有効性を検証する予定です．そのうえで，災害地物資運搬問題の適用上の問題点として1) 環境の情報を入力してから答えを出すまでの学習速度，そして2) シミュレーション環境と実問題環境が異なる場合の対処法の2個が考えられ，対処する必要があります．つまり，実問題において環境は刻一刻と替わるもので，学習に時間がかかってしまってはその結果は古いものになってしまう恐れがあります．そして仮に学習できたとしてもシミュレーションで学習した結果が必ずしも適用できるものであるとは限りません．今後はこれらの問題に対応するため，まずエージェントが学習中に動作不能となった時に追従し，学習可能な手法を提案します．具体的にはこの変化は各エージェントの獲得報酬値が変化するため，それに追従して学習することで対処可能であると考えています．次に，学習の高速化に取り組みます．これに関しては，1回の学習で得られる情報を利用し，逆強化学習を用いて学習ができていない状態と行動の価値を推定する手法を提案します．最後に，実問題に適用してシミュレーション結果が適用できない時に対応するため，今まで学習した結果を分割し組み合わせる手法を提案し，解決を目指します．

Research Products
(25 results)

All 2019 2018

All Journal Article (2 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 2 results, Open Access: 2 results) Presentation (21 results) (of which Int'l Joint Research: 12 results, Invited: 1 results) Book (1 results) Patent(Industrial Property Rights) (1 results) (of which Overseas: 1 results)

[Journal Article] Multi-Agent Cooperation Based on Reinforcement Learning with Internal Reward in Maze Problem2018
- Author(s)
  UWANO Fumito、TATEBE Naoki、TAJIMA Yusuke、NAKATA Masaya、KOVACS Tim、TAKADAMA Keiki
- Journal Title
  
  SICE Journal of Control, Measurement, and System Integration
  
  Volume: 11 Pages: 321～330
- DOI
  https://doi.org/10.9746/jcmsi.11.321
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Weighted Opinion Sharing Model for Cutting Link and Changing Information among Agents as Dynamic Environment2018
- Author(s)
  UWANO Fumito、SAITO Rei、TAKADAMA Keiki
- Journal Title
  
  SICE Journal of Control, Measurement, and System Integration
  
  Volume: 11 Pages: 331～340
- DOI
  https://doi.org/10.9746/jcmsi.11.331
- Peer Reviewed / Open Access
[Presentation] How to Select Appropriate Craters to Estimate Location Accurately in Comprehensive Situations for SLIM Project2019
- Author(s)
  Fumito Uwano, Takato Tatsumi, Akinori Murata, Keiki Takadama, Hiroyuki Kamata, Takayuki Ishida, Seisuke Fukuda, Shujiro Sawai, and Shinichiro Sakai
- Organizer
  The 32nd International Symposium on Space Technology and Science and the 9th Nano-Satellite Symposium
- Int'l Joint Research
[Presentation] How to Design Adaptable Agents to Obtain Consensus with Omoiyari2019
- Author(s)
  Yoshimiki Maekawa, Fumito Uwano, Eiki Kitajima, and Keiki Takadama
- Organizer
  The 21st International Conference on Human-Computer Interaction
- Int'l Joint Research / Invited
[Presentation] Niche Radius Adaptation in Bat Algorithm for Location Multiple Optima in Multimodal Functions2019
- Author(s)
  Takuya Iwase, Ryo Takano, Fumito Uwano, Hiroyuki Sato, and Keiki Takadama
- Organizer
  IEEE Congress on Evolutionary Computation 2019
- Int'l Joint Research
[Presentation] Bat Algorithm with Dynamic Niche Radius for Multimodal Optimization2019
- Author(s)
  Takuya Iwase, Ryo Takano, Fumito Uwano, Hiroyuki Sato, and Keiki Takadama
- Organizer
  The 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence
- Int'l Joint Research
[Presentation] Maximum Entropy Inverse Reinforcement Learning with Incomplete Experts2019
- Author(s)
  Satoshi Hasegawa, Fumito Uwano, and Keiki Takadama
- Organizer
  The 24th International Symposium on Artificial Life and Robotics
- Int'l Joint Research
[Presentation] Novelty Search-based Bat Algorithm: Adjusting Distance among Solutions for Multimodal Optimization2019
- Author(s)
  Takuya Iwase, Ryo Takano, Fumito Uwano, Hiroyuki Sato, and Keiki Takadama
- Organizer
  The 22nd Asia Pacific Symposium on Intelligent and Evolutionary Systems
- Int'l Joint Research
[Presentation] 集団適応を導くギャップ補填に基づく「思いやり」2019
- Author(s)
  前川佳幹，上野史，北島瑛貴，髙玉圭樹
- Organizer
  第33回人工知能学会全国大会
[Presentation] 故障に対して冗長性を備えた仮想ロボットのニューロ進化による持続可能な行動獲得2019
- Author(s)
  速水陽平，辰巳嵩豊，上野史，髙玉圭樹
- Organizer
  第46回知能システムシンポジウム
[Presentation] 好奇心を持つエージェントによる多様性のある情報伝搬シミュレーションモデルの提案2019
- Author(s)
  北島瑛貴，髙玉圭樹，村田暁紀，上野史
- Organizer
  HAIシンポジウム2018
[Presentation] Strategy for Learning Cooperative Behavior with Local Information for Multi-agent Systems2018
- Author(s)
  Fumito Uwano and Keiki Takadama
- Organizer
  The 21st International Conference on Principles and Practice of Multi-Agent Systems
- Int'l Joint Research
[Presentation] Generalizing Rules by Random Forest-based Learning Classifier Systems for High-dimensional Data Mining2018
- Author(s)
  Fumito Uwano, Koji Dobashi, Keiki Takadama, and Tim Kovacs
- Organizer
  The Genetic and Evolutionary Computation Conference Companion 2018
- Int'l Joint Research
[Presentation] Analyzing Triangle Matching Method Based on Craters for Spacecraft Localization2018
- Author(s)
  Fumito Uwano, Haruyuki Ishii, Yuta Umenai, Kazuma Matsumoto, Takato Tatsumi, Akinori Murata, and Keiki Takadama
- Organizer
  The International Symposium on Artificial Intelligence, Robotics and Automation in Space 2018
- Int'l Joint Research
[Presentation] Correcting Wrongly Determined Opinions of Agents in Opinion Sharing Model2018
- Author(s)
  Eiki Kitajima, Caili Zhang, Haruyuki Ishii, Fumito Uwano, and Keiki Takadama
- Organizer
  The 20th International Conference on Human-Computer Interaction
- Int'l Joint Research
[Presentation] Multiple Swarm Intelligence Methods based on Multiple Population with Sharing Best Solution for Drastic Environmental Change2018
- Author(s)
  Yuta Umenai, Fumito Uwano, Hiroyuki Sato, and Keiki Takadama
- Organizer
  The Genetic and Evolutionary Computation Conference Companion 2018
- Int'l Joint Research
[Presentation] How to Detect Essential Craters in Camera Shot Image to Increase the Number of Spacecraft Location Estimation while Improving its Accuracy?2018
- Author(s)
  Haruyuki Ishii, Yuta Umenai, Kazuma Matsumoto, Fumito Uwano, Takato Tatsumi, Keiki Takadama, Hiroyuki Kamata, Takayuki Ishida, Seisuke Fukuda, Shujiro Sawai, and Shinichiro Sakai
- Organizer
  The International Symposium on Artificial Intelligence, Robotics and Automation in Space 2018
- Int'l Joint Research
[Presentation] 報酬の動的変化に適応する通信無しマルチエージェント協調学習のための公平性に基づく内部報酬設定法2018
- Author(s)
  上野史，髙玉圭樹
- Organizer
  計測自動制御学会システム・情報部門学術講演会2018
[Presentation] 包括的な撮影画像パターンに対するSLIM探査機の自己位置推定の評価と精度向上2018
- Author(s)
  上野史，村田暁紀，辰巳嵩豊，髙玉圭樹，鎌田弘之，石田貴行，福田盛介，澤井秀次郎，坂井真一郎
- Organizer
  第62回宇宙科学技術連合講演会
[Presentation] 行動系列分割に基づく不完全なエキスパートからの逆強化学習2018
- Author(s)
  長谷川智，上野史，髙玉圭樹
- Organizer
  計測自動制御学会システム・情報部門学術講演会2018
[Presentation] 複数解探索を考慮した分散型Bat Algorithm2018
- Author(s)
  岩瀬拓哉，高野諒，上野史，佐藤寛之，髙玉圭樹
- Organizer
  計測自動制御学会システム・情報部門学術講演会2018
[Presentation] グリッドネットワーク上の誤報抑制意見共有アルゴリズム2018
- Author(s)
  北島瑛貴，辰巳嵩豊，村田暁紀，上野史，髙玉圭樹
- Organizer
  計測自動制御学会システム・情報部門学術講演会2018
[Presentation] 睡眠時無呼吸症候群患者のための無拘束型リアルタイム睡眠段階推定法2018
- Author(s)
  田島友祐，上野史，原田智広，髙玉圭樹
- Organizer
  ヘルスケア・医療情報通信技術研究会
[Book] PRIMA 2018: Principles and Practice of Multi-Agent Systems2018
- Author(s)
  Fumito Uwano and Keiki Takadama
- Total Pages
  663~667
- Publisher
  Springer
- ISBN
  978-3-030-03098-8
[Patent(Industrial Property Rights)] 点群マッチング装置，点群マッチング方法及びプログラム2018
- Inventor(s)
  髙玉圭樹，石井晴之，上野史
- Industrial Property Rights Holder
  髙玉圭樹，石井晴之，上野史
- Industrial Property Rights Type
  特許
- Industrial Property Number
  特願2018-106820
- Overseas

2018 Fiscal Year Annual Research Report

通信無し強化学習エージェント群による動的環境への追従

Principal Investigator

上野 史 電気通信大学, 情報理工学研究科, 特別研究員(DC1)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Multi-Agent Cooperation Based on Reinforcement Learning with Internal Reward in Maze Problem2018

Author(s)

Journal Title

DOI

[Journal Article] Weighted Opinion Sharing Model for Cutting Link and Changing Information among Agents as Dynamic Environment2018

Author(s)

Journal Title

DOI

[Presentation] How to Select Appropriate Craters to Estimate Location Accurately in Comprehensive Situations for SLIM Project2019

Author(s)

Organizer

[Presentation] How to Design Adaptable Agents to Obtain Consensus with Omoiyari2019

Author(s)

Organizer

[Presentation] Niche Radius Adaptation in Bat Algorithm for Location Multiple Optima in Multimodal Functions2019

Author(s)

Organizer

[Presentation] Bat Algorithm with Dynamic Niche Radius for Multimodal Optimization2019

Author(s)

Organizer

[Presentation] Maximum Entropy Inverse Reinforcement Learning with Incomplete Experts2019

Author(s)

Organizer

[Presentation] Novelty Search-based Bat Algorithm: Adjusting Distance among Solutions for Multimodal Optimization2019

Author(s)

Organizer

[Presentation] 集団適応を導くギャップ補填に基づく「思いやり」2019

Author(s)

Organizer

[Presentation] 故障に対して冗長性を備えた仮想ロボットのニューロ進化による持続可能な行動獲得2019

Author(s)

Organizer

[Presentation] 好奇心を持つエージェントによる多様性のある情報伝搬シミュレーションモデルの提案2019

Author(s)

Organizer

[Presentation] Strategy for Learning Cooperative Behavior with Local Information for Multi-agent Systems2018

Author(s)

Organizer

[Presentation] Generalizing Rules by Random Forest-based Learning Classifier Systems for High-dimensional Data Mining2018

Author(s)

Organizer

[Presentation] Analyzing Triangle Matching Method Based on Craters for Spacecraft Localization2018

Author(s)

Organizer

[Presentation] Correcting Wrongly Determined Opinions of Agents in Opinion Sharing Model2018

Author(s)

Organizer

[Presentation] Multiple Swarm Intelligence Methods based on Multiple Population with Sharing Best Solution for Drastic Environmental Change2018

Author(s)

Organizer

[Presentation] How to Detect Essential Craters in Camera Shot Image to Increase the Number of Spacecraft Location Estimation while Improving its Accuracy?2018

Author(s)

Organizer

[Presentation] 報酬の動的変化に適応する通信無しマルチエージェント協調学習のための公平性に基づく内部報酬設定法2018

Author(s)

Organizer

[Presentation] 包括的な撮影画像パターンに対するSLIM探査機の自己位置推定の評価と精度向上2018

Author(s)

Organizer

[Presentation] 行動系列分割に基づく不完全なエキスパートからの逆強化学習2018

Author(s)

Organizer

[Presentation] 複数解探索を考慮した分散型Bat Algorithm2018

Author(s)

Organizer

[Presentation] グリッドネットワーク上の誤報抑制意見共有アルゴリズム2018

Author(s)

Organizer

[Presentation] 睡眠時無呼吸症候群患者のための無拘束型リアルタイム睡眠段階推定法2018

Author(s)

Organizer

[Book] PRIMA 2018: Principles and Practice of Multi-Agent Systems2018

Author(s)

上野史電気通信大学, 情報理工学研究科, 特別研究員(DC1)