2022 Fiscal Year Annual Research Report

Designing a Practical Algorithm for Linear Bandits

Research Project

Project/Area Number	21J21272
Allocation Type	Single-year Grants
Research Institution	Kyoto University
Research Fellow	土屋平京都大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY)	2021-04-28 – 2024-03-31
Keywords	バンディット問題 / 機械学習
Outline of Annual Research Achievements	本年度の研究では，昨年度に引き続き背後の環境に対して頑健に動作するバンディットアルゴリズムの研究を行った．バンディット問題においては観測されるフィードバックの生成機構によって確率的設定と敵対的設定という大きく異なる理論的枠組みがある．しかし実問題がどちらの設定に属するのかは判断することは難しい．さらにはこれらの中間の環境である確率的環境に敵対的汚染の入った環境も考えられる．近年では，これら全ての設定で同時に最適性を達成するアルゴリズムである両環境最適なアルゴリズムが盛んに研究されている．しかし，既存の両環境最適性を達成可能なアルゴリズムは，比較的単純な逐次的意思決定問題であるエキスパート問題や多腕バンディット問題に限られており，実問題をより精密にモデル化したモデルである，構造を伴うバンディット問題において両環境最適性を達成可能かは未知であった．そこで，本年度の研究では，フィードバックグラフからのオンライン学習や部分観測問題などの，複雑な構造を伴うバンディット問題における両環境最適方策を実現した．具体的には，近年両環境最適性を達成するための方策として最も注目され，元々敵対的環境におけるアルゴリズムとして発展した follow-the-regularized-leader の枠組みにおいて，その正則化関数をうまく設計することで，理想的なアーム選択確率の振る舞いを実現し，それによって両環境最適性を実現することに成功した．これらの結果は機械学習・学習理論のトップ会議に複数採択された．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 部分観測問題は非常に一般的な逐次的意思決定問題であり実問題において重要な問題を多く含む一方，その複雑さゆえ両環境最適な方策を達成することは困難であることが予想されたが，ほぼ最適なリグレットを達成する両環境最適なアルゴリズムを構築できた．一方で，確率的環境においては，両環境最適性を達成する都合上，最大アーム数のオーダーの分だけ余計にリグレットを被っている可能性があり，これを改善することは実用的な性能向上の上で重要な研究課題である．
Strategy for Future Research Activity	本年度までの研究成果により，非常に多くの問題クラスにおいて確率的環境・敵対的環境においてほぼ最適に動作する両環境最適性方策を構築できた．これは主に部分観測問題が非常に多くの問題クラスを特別なケースとして含むことによるものである．しかし，部分観測問題のためのアルゴリズムはその一般性ゆえ，その特別な問題クラス専用に設計したアルゴリズムより性能が大きく落ちる場合があることが知られている．そこでこの問題を解決するために，背後の問題クラスの本質的な難しさに適応的に動作するアルゴリズムを実現することが重要な研究の推進方策であると考えられる．

Research Products
(7 results)

All 2023 2022

All Journal Article (5 results) (of which Peer Reviewed: 5 results, Open Access: 5 results) Presentation (2 results)

[Journal Article] Best-of-Both-Worlds Algorithms for Partial Monitoring2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, and Junya Honda
- Journal Title
  
  Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT 2023)
  
  Volume: 201 Pages: 1484-1515
- Peer Reviewed / Open Access
[Journal Article] Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems2023
- Author(s)
  Junya Honda, Shinji Ito, and Taira Tsuchiya
- Journal Title
  
  Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT 2023)
  
  Volume: 201 Pages: 726-754
- Peer Reviewed / Open Access
[Journal Article] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds2022
- Author(s)
  Shinji Ito, Taira Tsuchiya, and Junya Honda
- Journal Title
  
  Proceedings of Thirty Fifth Conference on Learning Theory (COLT 2022)
  
  Volume: 178 Pages: 1421-1422
- Peer Reviewed / Open Access
[Journal Article] Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification2022
- Author(s)
  Junpei Komiyama, Taira Tsuchiya, and Junya Honda
- Journal Title
  
  Advances in Neural Information Processing Systems 35 (NeurIPS 2022)
  
  Volume: 35 Pages: 10393-10404
- Peer Reviewed / Open Access
[Journal Article] Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs2022
- Author(s)
  Shinji Ito, Taira Tsuchiya, and Junya Honda
- Journal Title
  
  Advances in Neural Information Processing Systems 35 (NeurIPS 2022)
  
  Volume: 35 Pages: 28631-28643
- Peer Reviewed / Open Access
[Presentation] バンディット問題における Follow-The-Perturbated-Leader 方策の確率的・敵対的最適性について2023
- Author(s)
  本多淳也，伊藤伸志，土屋平
- Organizer
  Information-Based Induction Sciences and Machine Learning Workshop
[Presentation] バンディット問題における Best-of-Both-Worlds 方策の進展：構造的バンディットと分散依存リグレット2022
- Author(s)
  土屋平，伊藤伸志，本多淳也
- Organizer
  25th Information-Based Induction Sciences Workshop (IBIS 2022)

2022 Fiscal Year Annual Research Report

Designing a Practical Algorithm for Linear Bandits

Current Status of Research Progress

Reason

Research Products

[Journal Article] Best-of-Both-Worlds Algorithms for Partial Monitoring2023

Author(s)

Journal Title

[Journal Article] Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems2023

Author(s)

Journal Title

[Journal Article] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds2022

Author(s)

Journal Title

[Journal Article] Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification2022

Author(s)

Journal Title

[Journal Article] Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs2022

Author(s)

Journal Title

[Presentation] バンディット問題における Follow-The-Perturbated-Leader 方策の確率的・ 敵対的最適性について2023

Author(s)

Organizer

[Presentation] バンディット問題における Best-of-Both-Worlds 方策の進展：構造的バンディットと分散依存リグレット2022

Author(s)

Organizer

[Presentation] バンディット問題における Follow-The-Perturbated-Leader 方策の確率的・敵対的最適性について2023