2023 Fiscal Year Research-status Report

バンディット問題における最適性達成のためのランダム方策の発展と解析

Research Project

Project/Area Number	21K11747
Research Institution	Kyoto University
Principal Investigator	本多淳也京都大学, 情報学研究科, 准教授 (10712391)
Project Period (FY)	2021-04-01 – 2025-03-31
Keywords	機械学習 / バンディット問題 / オンライン学習 / 治験
Outline of Annual Research Achievements	本年度の研究では、バンディット問題におけるランダム方策のうちまず組合せ構造をもつバンディット問題に対して分散適応性のある方策の構築を行った。組合せ構造をもつ設定は推薦システムにおける商品の同時推薦や経路探索など、実応用において損失や報酬の最大範囲に比べて実際の報酬のばらつきが小さくなりやすい。このような設定に対して、本研究では確率的設定・敵対的設定のいずれでも最適オーダーの性能保証（両環境最適性とよばれる）をもちつつ、小さな分散をもつ報酬系列に対して適応的に動作する方策を新たに構成した。両環境最適性をもつ方策のほとんどは、方策の損失を安定性項・罰則項とよばれる２つの項に分解し、それらを適切にバランスさせるような学習率を設定することで構成される。ここで、従来は学習率を安定性項・罰則項のいずれかのみに動的に依存する方策への解析方法が知られており、これが原因となり達成可能な性能保証に制約が生じていた。これに対し本研究では安定性項・罰則項の双方に動的に依存する学習率に対する新たな解析手法を確立し、これにより実際に広範な設定において優れた保証を達成可能であることを示した。また、確率的環境におけるランダム方策についてはトンプソン抽出とよばれる方策が最適に近い性能を小さい計算量で達成する方策として知られているが、その性能保証は一部の比較的解析しやすい設定に限られていた。これに対し、本研究ではパレート分布モデルに対するトンプソン抽出が事前分布によっては一般的な対数オーダーではなく多項式オーダーの損失を被ることを新たに示し、その修正方法を示した。その他、累積報酬を最大化するのでなく優れた候補の発見を目指す最適腕識別の問題において、トンプソン抽出の手法を応用することで優れた性能を達成する方策を構築したほか、実際の治験における第１相試験において優れた用量を発見する方策の構築を行った。
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 両環境最適性をもつ方策の構成については近年研究が進んでおり競争的なトピックであるが、本研究では上記の結果がトップ国際会議NeurIPSおよびAISTATSに採録され非常に順調な結果が得られた。さらに、古典的な確率的設定における方策についても国際会議ICMLおよびACMLに採録され、さらには治験における実応用についても創薬の論文誌Journal of Biopharmaceutical Statisticsに採録されるなど、理論・応用の両面から極めて順調な結果が得られたといえる。
Strategy for Future Research Activity	以上の進捗状況を踏まえ、今後も特に敵対的設定に対応可能なランダム方策に関する研究を推進する予定である。特に、バンディット問題の一般化である部分観測問題についてはExploration by Optimizationとよばれる損失上界の最適化に基づく方策が近年注目されているが、これは両環境最適性の達成とはやや相性が悪いという問題が知られており、今後はこういった点への対応を検討している。
Causes of Carryover	これまでに得られた成果をさらに精緻なものとするため、両環境最適性を達成する方策の部分観測問題への拡張を現在行っており、その研究成果の国際会議における発表および他研究者との情報交換のための旅費を来年度使用する。

Research Products
(6 results)

All 2023

All Journal Article (5 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 5 results, Open Access: 4 results) Presentation (1 results) (of which Invited: 1 results)

[Journal Article] Optimal dose escalation methods using deep reinforcement learning in phase I oncology trials2023
- Author(s)
  Matsuura Kentaro、Sakamaki Kentaro、Honda Junya、Sozu Takashi
- Journal Title
  
  Journal of Biopharmaceutical Statistics
  
  Volume: 33 Pages: 639～652
- DOI
  10.1080/10543406.2023.2170402
- Peer Reviewed
[Journal Article] Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, Junya Honda
- Journal Title
  
  Advances in Neural Information Processing Systems
  
  Volume: 36 Pages: 47406～47437
- Peer Reviewed / Open Access
[Journal Article] Thompson Exploration with Best Challenger Rule in Best Arm Identification2023
- Author(s)
  Jongyeong Lee, Junya Honda, Masashi Sugiyama
- Journal Title
  
  Proceedings of the 15th Asian Conference on Machine Learning (ACML 2023)
  
  Volume: 222 Pages: 646～661
- Peer Reviewed / Open Access
[Journal Article] Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits2023
- Author(s)
  Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama
- Journal Title
  
  Proceedings of the 40th International Conference on Machine Learning (ICML2023)
  
  Volume: 202 Pages: 18810～18851
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, Junya Honda
- Journal Title
  
  Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS2023)
  
  Volume: 206 Pages: 8117～8144
- Peer Reviewed / Open Access
[Presentation] バンディット問題における漸近最適方策のランダム化に基づく構築2023
- Author(s)
  本多淳也
- Organizer
  第35回RAMP数理最適化シンポジウム (RAMP 2023)
- Invited

2023 Fiscal Year Research-status Report

バンディット問題における最適性達成のためのランダム方策の発展と解析

Principal Investigator

本多 淳也 京都大学, 情報学研究科, 准教授 (10712391)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Optimal dose escalation methods using deep reinforcement learning in phase I oncology trials2023

Author(s)

Journal Title

DOI

[Journal Article] Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds2023

Author(s)

Journal Title

[Journal Article] Thompson Exploration with Best Challenger Rule in Best Arm Identification2023

Author(s)

Journal Title

[Journal Article] Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits2023

Author(s)

Journal Title

[Journal Article] Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits2023

Author(s)

Journal Title

[Presentation] バンディット問題における漸近最適方策のランダム化に基づく構築2023

Author(s)

Organizer

本多淳也京都大学, 情報学研究科, 准教授 (10712391)