バンディット問題における最適性達成のためのランダム方策の発展と解析

Research Project

Project/Area Number	21K11747
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 60010:Theory of informatics-related
Research Institution	Kyoto University
Principal Investigator	本多淳也京都大学, 情報学研究科, 准教授 (10712391)
Project Period (FY)	2021-04-01 – 2024-03-31
Project Status	Granted (Fiscal Year 2022)
Budget Amount *help	¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2023: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Keywords	機械学習 / 学習理論 / 実験計画 / 情報理論
Outline of Research at the Start	本研究は，ウェブの推薦システムや新規化合物の開発等をはじめとした，試行錯誤を通じて優れた候補を探索するモデルであるバンディット問題における効率的なアルゴリズムの開発を行う．この問題ではトンプソン抽出とよばれるランダム方策が近年よく用いられるようになっているが，これはアルゴリズムの柔軟性が低く限られた設定に対してしか理論限界を達成できない．そこで，本研究はトンプソン抽出をはじめとしたランダム方策が複雑な計算を避けつつ優れた性能を示す原理を系統立てて理解し取り出すことで，汎用的に理論限界を達成可能かつ実用的なランダム方策の構成法を確立する．
Outline of Annual Research Achievements	本年度はバンディット問題におけるランダム方策のうち，特に敵対的バンディットとよばれる設定に対して頑健な方策の構築と解析を主に行った。敵対的バンディットとは報酬が一定の確率分布からではなく敵対者によって生成される設定で、確率的・敵対的両方の設定で同時に最適オーダーを達成する方策はBest-Of-Both-Worlds (BOBW)とよばれる。これらは敵対的設定に対応するためにランダム方策を用いることが本質的に必要となる。これらに対して、本研究では敵対的設定においても報酬の分散情報を活用可能なBOBW方策を新たに提案し、各設定に特化した方策に比べて損失が高々２倍程度で抑えられることを証明した。この結果は学習理論のトップ会議COLT2022に採録された。次に、バンディット問題を一般化した問題クラスとしてグラフフィードバック付きバンディットや部分観測問題とよばれる設定が知られている。本研究ではこれらの設定に対して「最適化による探索」とよばれる近年別の文脈で知られるようになった手法を取り入れることでBOBW方策を新たに構成した。この結果は機械学習のトップ会議NeurIPS2022および学習理論のトップ会議ALT2023に採録された。上記のBOBW方策はいずれもランダム選択の確率分布を時刻ごとに最適化計算によって求める必要があった。これに対して、最適化計算の不要なランダム方策であるFollow-The-Perturbed-Leader (FTPL)がBOBW性を達成可能であるかは長年未解決であったが、本研究では対称多項式に関する代数的な手法を用いることでこの問題を肯定的に解決した。この結果はALT2023に採録された。その他、最適腕識別とよばれる設定に対してランダム方策を含む一般の方策に対する理論限界を新たに構築した。この結果はNeurIPS2022に採録された。
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason BOBW方策に関する研究では確率的設定・敵対的設定の両面に対する深い理解が必要となるが、本研究代表者が得意とする確率的設定に関する知見を敵対的設定に関する深い知識を有する共同研究者と協業することで研究が大幅に進展し、トップ会議に論文５本が採録されるという極めて順調な成果が得られた。
Strategy for Future Research Activity	以上の進捗状況を踏まえ、今後は特に敵対的設定に対応可能なランダム方策の構築および解析を当面行っていく予定である。これらについては、2022年度までに得られた結果の順当な拡張や改善のほか、BOBW方策の確率的設定における本質的な限界を追求するといったテーマにも取り組む予定である。後者については少なくとも従来の標準的な解析手法では不可能であることが予備研究から示唆されており、確率過程としての解析といった従来この文脈では全く用いられていないアプローチからの解決も検討している。

Report

(2 results)

2022 Research-status Report
2021 Research-status Report

Research Products

(9 results)

All 2023 2022 2021

All Journal Article (8 results) (of which Int'l Joint Research: 3 results, Peer Reviewed: 8 results, Open Access: 8 results) Presentation (1 results) (of which Invited: 1 results)

[Journal Article] Best-of-Both-Worlds Algorithms for Partial Monitoring2023
- Author(s)
  Taira Tsuchiya, Shinji Ito, Junya Honda
- Journal Title
  
  Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT2023)
  
  Volume: 201 Pages: 1484-1515
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems2023
- Author(s)
  Junya Honda, Shinji Ito, Taira Tsuchiya
- Journal Title
  
  Proceedings of The 34th International Conference on Algorithmic Learning Theory (ALT2023)
  
  Volume: 201 Pages: 726-754
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds2022
- Author(s)
  Shinji Ito, Taira Tsuchiya, Junya Honda
- Journal Title
  
  Proceedings of The 35th Annual Conference on Learning Theory (COLT2022)
  
  Volume: 178 Pages: 1421-1422
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification2022
- Author(s)
  Junpei Komiyama, Taira Tsuchiya, Junya Honda
- Journal Title
  
  Advances in Neural Information Processing Systems
  
  Volume: 35 Pages: 10393-10404
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs2022
- Author(s)
  Shinji Ito, Taira Tsuchiya, Junya Honda
- Journal Title
  
  Advances in Neural Information Processing Systems
  
  Volume: 35 Pages: 28631-28643
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Bayesian optimization with partially specified queries2022
- Author(s)
  Shogo Hayashi, Junya Honda, Hisashi Kashima
- Journal Title
  
  Machine Learning
  
  Volume: 111 Issue: 3 Pages: 1019-1048
- DOI
  10.1007/s10994-021-06079-3
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Optimal adaptive allocation using deep reinforcement learning in a dose‐response study2021
- Author(s)
  Matsuura Kentaro、Honda Junya、El Hanafi Imad、Sozu Takashi、Sakamaki Kentaro
- Journal Title
  
  Statistics in Medicine
  
  Volume: 41 Issue: 7 Pages: 1157-1171
- DOI
  10.1002/sim.9247
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences2021
- Author(s)
  Ikko Yamane、Junya Honda、Florian Yger、Masashi Sugiyama
- Journal Title
  
  Proceedings of the 38th International Conference on Machine Learning
  
  Volume: 139 Pages: 11637-11647
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] 汎用的な逐次意思決定アルゴリズムに向けて2022
- Author(s)
  本多淳也
- Organizer
  第48回IBISML研究会
- Related Report
  2022 Research-status Report
- Invited

バンディット問題における最適性達成のためのランダム方策の発展と解析

Principal Investigator

本多 淳也 京都大学, 情報学研究科, 准教授 (10712391)

¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Best-of-Both-Worlds Algorithms for Partial Monitoring2023

Author(s)

Journal Title

Related Report

[Journal Article] Follow-the-Perturbed-Leader Achieves Best-of-Both-Worlds for Bandit Problems2023

Author(s)

Journal Title

Related Report

[Journal Article] Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds2022

Author(s)

Journal Title

Related Report

[Journal Article] Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification2022

Author(s)

Journal Title

Related Report

[Journal Article] Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs2022

Author(s)

Journal Title

Related Report

[Journal Article] Bayesian optimization with partially specified queries2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Optimal adaptive allocation using deep reinforcement learning in a dose‐response study2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences2021

Author(s)

Journal Title

Related Report

[Presentation] 汎用的な逐次意思決定アルゴリズムに向けて2022

Author(s)

Organizer

Related Report

本多淳也京都大学, 情報学研究科, 准教授 (10712391)