Designing a Practical Algorithm for Linear Bandits

Research Project

Project/Area Number	22KJ1680
Project/Area Number (Other)	21J21272 (2021-2022)
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Multi-year Fund (2023) Single-year Grants (2021-2022)
Section	国内
Review Section	Basic Section 60030:Statistical science-related
Research Institution	The University of Tokyo (2023) Kyoto University (2021-2022)
Principal Investigator	土屋平 (2021, 2023) 東京大学, 大学院情報理工学系研究科, 助教
Research Fellow	土屋平 (2022) 京都大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY)	2023-03-08 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥2,200,000 (Direct Cost: ¥2,200,000) Fiscal Year 2023: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2022: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2021: ¥800,000 (Direct Cost: ¥800,000)
Keywords	機械学習 / 学習理論 / バンディット問題 / 逐次的意思決定問題 / オンライン凸最適化 / 両環境最適アルゴリズム
Outline of Research at the Start	本年度の研究では昨年度に引き続き，背後の環境に対して頑健に動作するバンディットアルゴリズムの研究を行う．バンディット問題においては確率的設定と敵対的設定という大きく理論的に異なる枠組みがあるが，実問題がどちらの問題クラスに属するか判断は難しい．そこで，単一のアルゴリズムで両方の設定において最適性を達成することが望ましく，そのような性質は両環境最適性と呼ばれる．既存の両環境最適なアルゴリズムは比較的単純な問題に対してのみ適用可能であり，また，問題の難しさへの適応力が十分できないという問題があった．今年度は後者の問題を解決する両環境最適アルゴリズムの構築を行う．
Outline of Annual Research Achievements	本年度は，逐次的意思決定問題において確率的環境と敵対的環境で同時に最適性を達成する両環境最適アルゴリズムが，対象とする問題が属する最小の問題クラスの難しさに適応的に動作する技術の開発を行った．具体的には，部分観測問題と呼ばれる多腕バンディット問題や動的価格設定などの多くの逐次的意思決定問題を特別な場合として含む非常に一般的な逐次的意思決定問題を対象とした．問題が属する最小クラスの難しさに適応的に動作させるために，両環境最適性を達成するための標準的な手法である Follow-the-Regularized-Leader において，その正則化関数の強さとアルゴリズムの安定性に同時適応的な学習率を構築した．それにより，両環境最適性と問題の難しさへの適応性を同時に達成することに成功した．本研究成果は，機械学習分野で最も権威のある国際会議であるNeurIPS2023に採択された．他にも行動の選択肢に組合せ的構造を持つ組合せ半バンディット問題において複数の環境最適性を同時に達成可能な両環境最適アルゴリズムを構築した．本研究成果は，機械学習・学習理論分野の代表的な国際会議であるAISTATS2023に採択された． 3年間の研究成果全体として，様々な構造を伴う逐次的意思決定問題において，多様な環境適応性を持つアルゴリズムを構築することに成功した．特に，両環境最適アルゴリズムは実問題で頻出する確率的環境と敵対的環境の中間の環境でほぼ最適なリグレットを達成可能であり実応用上重要な貢献となった．本研究の当初の目標は主に観測が線形な構造を持つ線形バンディット問題においてノイズに対して頑健に動作するアルゴリズムを構築することであったが，当初の計画より多様な問題設定において頑健性だけでなく他の適応性も有するアルゴリズムを構築することに成功した．

Report

(3 results)

Research Products
(15 results)

All 2024 2023 2022

All Journal Article (7 results) (of which Peer Reviewed: 7 results, Open Access: 7 results) Presentation (8 results) (of which Int'l Joint Research: 2 results, Invited: 1 results)

[Journal Article] Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, and Junya Honda
- Journal Title
  
  Proceedings of 26th International Conference on Artificial Intelligence and Statistics
  
  Volume: 206 Pages: 8117-8144
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, and Junya Honda
- Journal Title
  
  Neural Information Processing Systems
  
  Volume: 36 Pages: 47406-47437
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Best-of-Both-Worlds Algorithms for Partial Monitoring2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, and Junya Honda
- Journal Title
  
  Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT 2023)
  
  Volume: 201 Pages: 1484-1515
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems2023
- Author(s)
  Junya Honda, Shinji Ito, and Taira Tsuchiya
- Journal Title
  
  Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT 2023)
  
  Volume: 201 Pages: 726-754
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds2022
- Author(s)
  Shinji Ito, Taira Tsuchiya, and Junya Honda
- Journal Title
  
  Proceedings of Thirty Fifth Conference on Learning Theory (COLT 2022)
  
  Volume: 178 Pages: 1421-1422
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification2022
- Author(s)
  Junpei Komiyama, Taira Tsuchiya, and Junya Honda
- Journal Title
  
  Advances in Neural Information Processing Systems 35 (NeurIPS 2022)
  
  Volume: 35 Pages: 10393-10404
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs2022
- Author(s)
  Shinji Ito, Taira Tsuchiya, and Junya Honda
- Journal Title
  
  Advances in Neural Information Processing Systems 35 (NeurIPS 2022)
  
  Volume: 35 Pages: 28631-28643
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] Best of Both Worlds Algorithms in Online Decision Making Problems2024
- Author(s)
  Taira Tsuchiya
- Organizer
  Machine Learning Summer School 2024
- Related Report
  2023 Annual Research Report
[Presentation] Best-of-Both-Worlds Algorithms for Partial Monitoring2023
- Author(s)
  土屋平，伊藤伸志，本多淳也
- Organizer
  第125回人工知能基本問題研究会（SIG-FPAI）
- Related Report
  2023 Annual Research Report
[Presentation] 組合せ半バンディット問題における適応的 best-of-both-worlds 方策2023
- Author(s)
  土屋平
- Organizer
  第22回情報科学技術フォーラム（FIT2023）
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] オンライン意思決定問題における複数の観測量に同時に依存したリグレット上界を有する FTRL と，それを用いたスパース性依存上界やゲーム依存型上界，両環境最適性の実現2023
- Author(s)
  土屋平，伊藤伸志，本多淳也
- Organizer
  第26回情報論的学習理論ワークショップ (IBIS2023)
- Related Report
  2023 Annual Research Report
[Presentation] Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, and Junya Honda
- Organizer
  International Conference on Artificial Intelligence and Statistics (AISTATS 2023)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, and Junya Honda
- Organizer
  Thirty-seventh Annual Conference on Neural Information Processing Systems 2023 (NeurIPS 2023)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] バンディット問題における Follow-The-Perturbated-Leader 方策の確率的・敵対的最適性について2023
- Author(s)
  本多淳也，伊藤伸志，土屋平
- Organizer
  Information-Based Induction Sciences and Machine Learning Workshop
- Related Report
  2022 Annual Research Report
[Presentation] バンディット問題における Best-of-Both-Worlds 方策の進展：構造的バンディットと分散依存リグレット2022
- Author(s)
  土屋平，伊藤伸志，本多淳也
- Organizer
  25th Information-Based Induction Sciences Workshop (IBIS 2022)
- Related Report
  2022 Annual Research Report

Designing a Practical Algorithm for Linear Bandits

Principal Investigator

土屋 平 (2021, 2023) 東京大学, 大学院情報理工学系研究科, 助教

¥2,200,000 (Direct Cost: ¥2,200,000)

Report

Research Products

[Journal Article] Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits2023

Author(s)

Journal Title

Related Report

[Journal Article] Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds2023

Author(s)

Journal Title

Related Report

[Journal Article] Best-of-Both-Worlds Algorithms for Partial Monitoring2023

Author(s)

Journal Title

Related Report

[Journal Article] Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems2023

Author(s)

Journal Title

Related Report

[Journal Article] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds2022

Author(s)

Journal Title

Related Report

[Journal Article] Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification2022

Author(s)

Journal Title

Related Report

[Journal Article] Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs2022

Author(s)

Journal Title

Related Report

[Presentation] Best of Both Worlds Algorithms in Online Decision Making Problems2024

Author(s)

Organizer

Related Report

[Presentation] Best-of-Both-Worlds Algorithms for Partial Monitoring2023

Author(s)

Organizer

Related Report

[Presentation] 組合せ半バンディット問題における適応的 best-of-both-worlds 方策2023

Author(s)

Organizer

Related Report

[Presentation] オンライン意思決定問題における複数の観測量に同時に依存したリグレット上界を有する FTRL と，それを用いたスパース性依存上界やゲーム依存型上界，両環境最適性の実現2023

Author(s)

Organizer

Related Report

[Presentation] Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits2023

Author(s)

Organizer

Related Report

[Presentation] Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds2023

Author(s)

Organizer

Related Report

[Presentation] バンディット問題における Follow-The-Perturbated-Leader 方策の確率的・ 敵対的最適性について2023

Author(s)

Organizer

Related Report

[Presentation] バンディット問題における Best-of-Both-Worlds 方策の進展：構造的バンディットと分散依存リグレット2022

Author(s)

Organizer

Related Report

土屋平 (2021, 2023) 東京大学, 大学院情報理工学系研究科, 助教

[Presentation] バンディット問題における Follow-The-Perturbated-Leader 方策の確率的・敵対的最適性について2023