2022 Fiscal Year Final Research Report

Theory Deepening for Practical Applications of Bandit Problem Policies

Research Project

PDF

Project/Area Number	19H04161
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Hokkaido University
Principal Investigator	Nakamura Atsuyoshi 北海道大学, 情報科学研究院, 教授 (50344487)
Co-Investigator(Kenkyū-buntansha)	田畑公次北海道大学, 電子科学研究所, 准教授 (20814445) 工藤峰一北海道大学, 情報科学研究院, 教授 (60205101)
Project Period (FY)	2019-04-01 – 2023-03-31
Keywords	バンディット問題 / オンライン学習
Outline of Final Research Achievements	In both adversarial and stochastic bandit settings, we formalized problems that are inspired by practical utility, proposed their efficient and high-performance solution algorithms, and evaluated them theoretically and experimentally. In the adversarial bandit setting, we developed an asymptotically optimal algorithm under the condition that at least one arm does not suffer any loss. In the stochastic setting, we formalized the classification bandit problem, in which the player decides whether the number of arms with their expected rewards at least a given threshold is at least a given threshold or not by drawing arms iteratively, and developed the P-tracking algorithm that is efficient and asymptotically optimal. These results are published in major peer-reviewed international journals and conference proceedings.
Free Research Field	機械学習、データマイニング
Academic Significance and Societal Importance of the Research Achievements	バンディット問題のアルゴリズムは、昔から効率的な治験を行うために研究され、現代ではインターネット広告配信、推薦システム、A/Bテストなどに用いられている。基本的に、能動的なサンプリングを行なって効率的に情報を得る方法の研究であり、様々な応用の可能性を秘めている。開発した分類バンディットアルゴリズムは、ラマン分光によるインタラクティブ計測による病理診断の高速化にも用いいることも可能であり、今後様々な分野の応用に発展することが期待される。