2020 Fiscal Year Final Research Report
Construction of knowledge discovery algorithms based on information theoretic methods
Project/Area Number |
18K17998
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 60010:Theory of informatics-related
|
Research Institution | Kyoto University (2020) The University of Tokyo (2018-2019) |
Principal Investigator |
Honda Junya 京都大学, 情報学研究科, 准教授 (10712391)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | 機械学習 / 情報理論 |
Outline of Final Research Achievements |
The multi-armed bandit problem is a problem of appropriately finding and choosing the candidates to be explored under a limited number of trials. In this research, we investigated policies for this problem based on the techniques of information theory. In particular, we established theoretical guarantees of the policy called Thompson sampling from the viewpoint of the information-theoretic lower bound, whereas this policy has been often used as an empirically promising heuristics. Furthermore, we also addressed the problem of finding the best candidate with the largest reward expectation rather than maximizing the cumulative reward in the multi-armed bandit problem. In this problem, existing formulations often required unrealistically large trials and heavy computation. In this research we formulate problems that are feasible under a realistic number of trials with practical algorithms by appropriately establishing the information-theoretic difficulty of the problem.
|
Free Research Field |
機械学習
|
Academic Significance and Societal Importance of the Research Achievements |
本研究の結果はトンプソン抽出の適用可能性とその限界を明らかにしたものであるが、この方策は推薦システムなど既に実社会で多く用いられているものであり、その正当性を明らかにすることはバンディット方策を安全に社会で運用することに貢献するものである.また,この分野の発展に伴いこれらの方策を治験などより社会的に繊細な問題に対して適用しようとする試みが近年あるが,これらの設定では推薦システムといった設定に比べて可能な試行回数が大幅に少ないことが障害になっている.本研究はこういった設定に対しても意味のある保証が可能な枠組みを定式化した点で,より社会の広範な設定でバンディット方策を適用可能とする意義をもつ.
|