2012 年度実績報告書

経験強化型学習ＸｏＬに関する発展的研究

研究課題

研究課題/領域番号	22500143
研究機関	独立行政法人大学評価・学位授与機構
研究代表者	宮崎和光独立行政法人大学評価・学位授与機構, 研究開発部, 准教授 (20282866)
研究期間 (年度)	2010-04-01 – 2013-03-31
キーワード	強化学習 / 機械学習 / 知能機械 / エージェント / 経験強化型学習
研究概要	平成24年度においては、当初の予定通り、「連続値入出力に対応した複数種類の報酬と罰を扱える手法」を完成させるとともに「XoLにおける報酬と罰の設計指針」を提示した。これらの成果は、国内シンポジウム（平成24年度電気学会電子・情報・システム部門大会）および国際会議（2nd ATISR）において発表した。また、国際的な学術雑誌（Journal of Computers）に掲載される予定にもなっている。一方、手法の応用の関しては、2足歩行ロボット、および、独立行政法人大学評価・学位授与機構における科目分類支援システムへの応用を行った。ロボットへの応用では、複数台のLEGOロボットによるKeepaway Taskへの適用を行った。本成果は、先に述べたJournal of Computers誌に掲載予定となっている。さらに、腱駆動2足歩行ロボットの腰軌道学習への適用も行った。本成果は、Journal of Advanced Computational Intelligence and Intelligent Informatics誌に掲載された。科目分類支援システムへの応用に関しては、XoLを用いた学習機能を完成させ、国際会議（SCIS-ISIS 2012）で発表した。これにより、今後実施する予定である「データベース作成・更新機能の実現」及び「情報工学区分以外での有効性の検証」につなげるための準備を整えることができた。これらXoLに関する「報酬と罰の設計指針」および「複数の領域における応用例」が示されたことで、試行錯誤に基づく学習手法としてのXoLの存在意義を強く主張できたと考える。なお、これらの成果は、計測自動制御学会が発行する「計測と制御」誌のリレー解説「強化学習の最近の発展」の第5回に掲載予定であり、産業界も含めた一般的な読者へ強くアピールすることが期待できる。
現在までの達成度 (区分)	理由 24年度が最終年度であるため、記入しない。
今後の研究の推進方策	24年度が最終年度であるため、記入しない。

研究成果

(13件)

すべて 2013 2012 その他

すべて雑誌論文 (9件) (うち査読あり 6件) 学会発表 (4件)

[雑誌論文] Proposal of an Exploitation-oriented Learning Method on Multiple Rewards and Penalties Environments and the Design Guideline2013
- 著者名/発表者名
  Kazuteru Miyazaki
- 雑誌名
  
  Journal of Computers
  
  巻: 印刷中ページ: 印刷中
- 査読あり
[雑誌論文] リレー解説「強化学習の最近の発展」第５回：応用志向の「試行錯誤に基づく目的指向学習」Exploitation-oriented Learning; XoL2013
- 著者名/発表者名
  宮崎和光
- 雑誌名
  
  計測と制御
  
  巻: Vol.52, No.5 ページ: 印刷中
[雑誌論文] マルチエージェント環境下における失敗確率伝播アルゴリズムEFPの有効性に関する研究2013
- 著者名/発表者名
  村岡宏紀, 宮崎和光, 小林博明
- 雑誌名
  
  第40回知能システムシンポジウム資料
  
  巻: なしページ: 319-324
[雑誌論文] Introduction of Fixed Mode States into Online Reinforcement Learning with Penalty and Reward and Its Application to Waist Trajectory Generation of Biped Robot2012
- 著者名/発表者名
  Seiya Kuroda, Kazuteru Miyazaki and Hiroaki Kobayashi
- 雑誌名
  
  Journal of Advanced Computational Intelligence and Intelligent Informatics
  
  巻: Vol.16, No.6 ページ: 758-768
- 査読あり
[雑誌論文] Proposal and Evaluation of the Active Course Classification Support System with Exploitation-oriented Learning2012
- 著者名/発表者名
  Kazuteru Miyazaki and Masaaki Ida
- 雑誌名
  
  Lecture Notes in Computer Science
  
  巻: Vol.7188 ページ: 333-344
- DOI
  10.1007/978-3-642-29946-9_32
- 査読あり
[雑誌論文] Introduction of Fixed Mode States into Online Profit Sharing and Its Application to Waist Trajectory Generation of Biped Robot2012
- 著者名/発表者名
  Seiya Kuroda, Kazuteru Miyazaki and Hiroaki Kobayashi
- 雑誌名
  
  Lecture Notes in Computer Science
  
  巻: Vol.7188 ページ: 297-308
- DOI
  10.1007/978-3-642-29946-9_29
- 査読あり
[雑誌論文] Proposal of an Exploitation-oriented Learning Method on Multiple Rewards and Penalties Environments2012
- 著者名/発表者名
  Kazuteru Miyazaki
- 雑誌名
  
  Proc. of the 2nd International Conference on Applied and Theoretical Information Systems Research (2nd ATIRSR)
  
  巻: なしページ: 9 pages (CD)
- 査読あり
[雑誌論文] Proposal of an Active Course Classification Support System with Exploitation-oriented Learning Extended by Positive and Negative Examples2012
- 著者名/発表者名
  Kazuteru Miyazaki and Masaaki Ida
- 雑誌名
  
  Proc. of the 6th International Conference on Soft Computing and Intelligent Systems and the 13th International Symposium on Advanced Intelligent Systems (SCIS-ISIS 2012)
  
  巻: なしページ: 1520-1527
- 査読あり
[雑誌論文] 複数種類の報酬と罰に対応した経験強化型学習の提案と設計指針に関する研究2012
- 著者名/発表者名
  宮崎和光
- 雑誌名
  
  平成24年電気学会電子・情報・システム部門大会講演論文集
  
  巻: なしページ: 559-564
[学会発表] Proposal of an Exploitation-oriented Learning Method on Multiple Rewards and Penalties Environments
- 著者名/発表者名
  Kazuteru Miyazaki
- 学会等名
  The 2nd International Conference on Applied and Theoretical Information Systems Research (2nd ATIRSR)
- 発表場所
  圓山大飯店, 台北
[学会発表] Proposal of an Active Course Classification Support System with Exploitation-oriented Learning Extended by Positive and Negative Examples
- 著者名/発表者名
  Kazuteru Miyazaki
- 学会等名
  The 6th International Conference on Soft Computing and Intelligent Systems and the 13th International Symposium on Advanced Intelligent Systems (SCIS-ISIS 2012)
- 発表場所
  神戸コンベンションセンター
[学会発表] マルチエージェント環境下における失敗確率伝播アルゴリズムEFPの有効性に関する研究
- 著者名/発表者名
  宮崎和光
- 学会等名
  第40回知能システムシンポジウム
- 発表場所
  京都工芸繊維大学
[学会発表] 複数種類の報酬と罰に対応した経験強化型学習の提案と設計指針に関する研究
- 著者名/発表者名
  宮崎和光
- 学会等名
  平成24年電気学会電子・情報・システム部門大会
- 発表場所
  弘前大学

2012 年度 実績報告書

経験強化型学習ＸｏＬに関する発展的研究

研究代表者

宮崎 和光 独立行政法人大学評価・学位授与機構, 研究開発部, 准教授 (20282866)

理由

研究成果

[雑誌論文] Proposal of an Exploitation-oriented Learning Method on Multiple Rewards and Penalties Environments and the Design Guideline2013

著者名/発表者名

雑誌名

[雑誌論文] リレー解説「強化学習の最近の発展」第５回：応用志向の「試行錯誤に基づく目的指向学習」Exploitation-oriented Learning; XoL2013

著者名/発表者名

雑誌名

[雑誌論文] マルチエージェント環境下における失敗確率伝播アルゴリズムEFPの有効性に関する研究2013

著者名/発表者名

雑誌名

[雑誌論文] Introduction of Fixed Mode States into Online Reinforcement Learning with Penalty and Reward and Its Application to Waist Trajectory Generation of Biped Robot2012

著者名/発表者名

雑誌名

[雑誌論文] Proposal and Evaluation of the Active Course Classification Support System with Exploitation-oriented Learning2012

著者名/発表者名

雑誌名

DOI

[雑誌論文] Introduction of Fixed Mode States into Online Profit Sharing and Its Application to Waist Trajectory Generation of Biped Robot2012

著者名/発表者名

雑誌名

DOI

[雑誌論文] Proposal of an Exploitation-oriented Learning Method on Multiple Rewards and Penalties Environments2012

著者名/発表者名

雑誌名

[雑誌論文] Proposal of an Active Course Classification Support System with Exploitation-oriented Learning Extended by Positive and Negative Examples2012

著者名/発表者名

雑誌名

[雑誌論文] 複数種類の報酬と罰に対応した経験強化型学習の提案と設計指針に関する研究2012

著者名/発表者名

雑誌名

[学会発表] Proposal of an Exploitation-oriented Learning Method on Multiple Rewards and Penalties Environments

著者名/発表者名

学会等名

発表場所

[学会発表] Proposal of an Active Course Classification Support System with Exploitation-oriented Learning Extended by Positive and Negative Examples

著者名/発表者名

学会等名

発表場所

[学会発表] マルチエージェント環境下における失敗確率伝播アルゴリズムEFPの有効性に関する研究

著者名/発表者名

学会等名

発表場所

[学会発表] 複数種類の報酬と罰に対応した経験強化型学習の提案と設計指針に関する研究

著者名/発表者名

学会等名

発表場所

2012 年度実績報告書

宮崎和光独立行政法人大学評価・学位授与機構, 研究開発部, 准教授 (20282866)