• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development and Analysis of Randomized Policies for Achieving Optimality in Bandit Problems

Research Project

Project/Area Number 21K11747
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 60010:Theory of informatics-related
Research InstitutionKyoto University

Principal Investigator

Honda Junya  京都大学, 情報学研究科, 准教授 (10712391)

Project Period (FY) 2021-04-01 – 2025-03-31
Project Status Completed (Fiscal Year 2024)
Budget Amount *help
¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)
Fiscal Year 2023: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Keywordsバンディット問題 / 機械学習 / 強化学習 / オンライン学習 / 両環境最適性 / 治験 / 学習理論 / 実験計画 / 情報理論
Outline of Research at the Start

本研究は,ウェブの推薦システムや新規化合物の開発等をはじめとした,試行錯誤を通じて優れた候補を探索するモデルであるバンディット問題における効率的なアルゴリズムの開発を行う.この問題ではトンプソン抽出とよばれるランダム方策が近年よく用いられるようになっているが,これはアルゴリズムの柔軟性が低く限られた設定に対してしか理論限界を達成できない.そこで,本研究はトンプソン抽出をはじめとしたランダム方策が複雑な計算を避けつつ優れた性能を示す原理を系統立てて理解し取り出すことで,汎用的に理論限界を達成可能かつ実用的なランダム方策の構成法を確立する.

Outline of Final Research Achievements

This research focused on randomized policies in decision-making problems known as bandit problems. Through this research, we clarified the applicability and limitations of Thompson sampling, a policy that has been extensively studied, in various problems such as dynamic pricing design and non-stationary settings. Additionally, we newly revealed that the policy called FTPL (Follow-the-Perturbed-Leader) has the excellent property called a best-of-both-worlds guarantee. Furthermore, we successfully constructed superior policies in various settings, including methods using randomized policies constructed through frameworks such as reinforcement learning and FTRL (Follow-the-Regularized-Leader).

Academic Significance and Societal Importance of the Research Achievements

トンプソンサンプリングは現在実用上も標準的に用いられている方策であり、その応用範囲の拡張や限界の解明は実用上大きな意義がある。また、FTPL方策は同設定で従来主流であったFTRLと異なり最適化計算が不要となる高速な方策であり、その両環境最適性については2010年代より未解決問題として考えられていた。本研究はこれを肯定的に解決したものであり、最近研究が盛んになっている両環境最適性をもつ方策を実用可能とするための重要な学術的意義をもつ。

Report

(5 results)
  • 2024 Annual Research Report   Final Research Report ( PDF )
  • 2023 Research-status Report
  • 2022 Research-status Report
  • 2021 Research-status Report
  • Research Products

    (26 results)

All 2025 2024 2023 2022 2021 Other

All Int'l Joint Research (2 results) Journal Article (20 results) (of which Int'l Joint Research: 6 results,  Peer Reviewed: 20 results,  Open Access: 19 results) Presentation (2 results) (of which Invited: 2 results) Remarks (2 results)

  • [Int'l Joint Research] Karlsruhe Institute of Technology(ドイツ)

    • Related Report
      2024 Annual Research Report
  • [Int'l Joint Research] Seoul National University(韓国)

    • Related Report
      2024 Annual Research Report
  • [Journal Article] Multi-Player Approaches for Dueling Bandits2025

    • Author(s)
      Or Raveh, Junya Honda, Masashi Sugiyama
    • Journal Title

      The 28th International Conference on Artificial Intelligence and Statistics (AISTATS 2025)

      Volume: -

    • Related Report
      2024 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits2024

    • Author(s)
      Junpei Komiyama, Edouard Fouche, Junya Honda
    • Journal Title

      Journal of Machine Learning Research

      Volume: 25 Pages: 1-56

    • Related Report
      2024 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] The Survival Bandit Problem2024

    • Author(s)
      Charles Riou, Junya Honda, Masashi Sugiyama
    • Journal Title

      Transactions on Machine Learning Research

      Volume: -

    • Related Report
      2024 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Learning with Posterior Sampling for Revenue Management under Time-varying Demand2024

    • Author(s)
      Kazuma Shimizu, Junya Honda, Shinji Ito, Shinji Nakadai
    • Journal Title

      The 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

      Volume: - Pages: 4911-4919

    • DOI

      10.24963/ijcai.2024/543

    • Related Report
      2024 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring2024

    • Author(s)
      Taira Tsuchiya, Shinji Ito, Junya Honda
    • Journal Title

      The 41st International Conference on Machine Learning (ICML 2024)

      Volume: 245 Pages: 48768-48790

    • Related Report
      2024 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Ratio Analysis and Best-of-Both-Worlds2024

    • Author(s)
      Shinji Ito, Taira Tsuchiya, Junya Honda
    • Journal Title

      The 37th Annual Conference on Learning Theory (COLT 2024)

      Volume: 247 Pages: 2522-2563

    • Related Report
      2024 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Follow-the-Perturbed-Leader with Frechet-type Tail Distributions: Optimality in Adversarial Bandits and Best-of-Both-Worlds2024

    • Author(s)
      Jongyeong Lee, Junya Honda, Shinji Ito, Min-hwan Oh
    • Journal Title

      The 37th Annual Conference on Learning Theory (COLT 2024)

      Volume: 247 Pages: 3375-3430

    • Related Report
      2024 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Optimal dose escalation methods using deep reinforcement learning in phase I oncology trials2023

    • Author(s)
      Matsuura Kentaro、Sakamaki Kentaro、Honda Junya、Sozu Takashi
    • Journal Title

      Journal of Biopharmaceutical Statistics

      Volume: 33 Issue: 5 Pages: 639-652

    • DOI

      10.1080/10543406.2023.2170402

    • Related Report
      2023 Research-status Report
    • Peer Reviewed
  • [Journal Article] Stability-penalty-adaptive Follow-the-regularized-leader: Sparsity, Game-dependency, and Best-of-both-worlds2023

    • Author(s)
      Taira Tsuchiya, Shinji Ito, Junya Honda
    • Journal Title

      Advances in Neural Information Processing Systems

      Volume: 36 Pages: 47406-47437

    • Related Report
      2023 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Thompson Exploration with Best Challenger Rule in Best Arm Identification2023

    • Author(s)
      Jongyeong Lee, Junya Honda, Masashi Sugiyama
    • Journal Title

      Proceedings of the 15th Asian Conference on Machine Learning (ACML 2023)

      Volume: 222 Pages: 646-661

    • Related Report
      2023 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits2023

    • Author(s)
      Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama
    • Journal Title

      Proceedings of the 40th International Conference on Machine Learning (ICML2023)

      Volume: 202 Pages: 18810-18851

    • Related Report
      2023 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits2023

    • Author(s)
      Taira Tsuchiya, Shinji Ito, Junya Honda
    • Journal Title

      Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS2023)

      Volume: 206 Pages: 8117-8144

    • Related Report
      2023 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Best-of-Both-Worlds Algorithms for Partial Monitoring2023

    • Author(s)
      Taira Tsuchiya, Shinji Ito, Junya Honda
    • Journal Title

      Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT2023)

      Volume: 201 Pages: 1484-1515

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems2023

    • Author(s)
      Junya Honda, Shinji Ito, Taira Tsuchiya
    • Journal Title

      Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT2023)

      Volume: 201 Pages: 726-754

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds2022

    • Author(s)
      Shinji Ito, Taira Tsuchiya, Junya Honda
    • Journal Title

      Proceedings of The 35th Annual Conference on Learning Theory (COLT2022)

      Volume: 178 Pages: 1421-1422

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification2022

    • Author(s)
      Junpei Komiyama, Taira Tsuchiya, Junya Honda
    • Journal Title

      Advances in Neural Information Processing Systems

      Volume: 35 Pages: 10393-10404

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs2022

    • Author(s)
      Shinji Ito, Taira Tsuchiya, Junya Honda
    • Journal Title

      Advances in Neural Information Processing Systems

      Volume: 35 Pages: 28631-28643

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Bayesian optimization with partially specified queries2022

    • Author(s)
      Shogo Hayashi, Junya Honda, Hisashi Kashima
    • Journal Title

      Machine Learning

      Volume: 111 Issue: 3 Pages: 1019-1048

    • DOI

      10.1007/s10994-021-06079-3

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Optimal adaptive allocation using deep reinforcement learning in a dose‐response study2021

    • Author(s)
      Matsuura Kentaro、Honda Junya、El Hanafi Imad、Sozu Takashi、Sakamaki Kentaro
    • Journal Title

      Statistics in Medicine

      Volume: 41 Issue: 7 Pages: 1157-1171

    • DOI

      10.1002/sim.9247

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences2021

    • Author(s)
      Ikko Yamane、Junya Honda、Florian Yger、Masashi Sugiyama
    • Journal Title

      Proceedings of the 38th International Conference on Machine Learning

      Volume: 139 Pages: 11637-11647

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] バンディット問題における漸近最適方策のランダム化に基づく構築2023

    • Author(s)
      本多淳也
    • Organizer
      第35回RAMP数理最適化シンポジウム (RAMP 2023)
    • Related Report
      2023 Research-status Report
    • Invited
  • [Presentation] 汎用的な逐次意思決定アルゴリズムに向けて2022

    • Author(s)
      本多淳也
    • Organizer
      第48回IBISML研究会
    • Related Report
      2022 Research-status Report
    • Invited
  • [Remarks]

    • URL

      https://www.jmlr.org/papers/v25/21-0916.html

    • Related Report
      2024 Annual Research Report
  • [Remarks]

    • URL

      https://proceedings.mlr.press/v247/lee24a.html

    • Related Report
      2024 Annual Research Report

URL: 

Published: 2021-04-28   Modified: 2026-01-16  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi