Advancements in Online Optimization with Indirect Feedback

Research Project

Project/Area Number	24K23852
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Multi-year Fund
Review Section	1001:Information science, computer engineering, and related fields
Research Institution	The University of Tokyo
Principal Investigator	土屋平東京大学, 大学院情報理工学系研究科, 助教 (00994683)
Project Period (FY)	2024-07-31 – 2026-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000) Fiscal Year 2025: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2024: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Keywords	機械学習 / 学習理論 / オンライン学習 / オンライン凸最適化 / バンディット問題
Outline of Research at the Start	オンライン学習は，環境から逐次的に得られる情報をもとに意思決定を行い，累積報酬を最大化する枠組みである．オンライン学習では多様な定式化が存在し，特に選択した行動の報酬が直接観測されず，間接的なフィードバックが得られる問題が多く存在する．本分野における近年の大きな関心は，環境の性質に適応的に動作するアルゴリズムを構築することであるが，既存の間接的フィードバックのもとでのオンライン学習における適応的アルゴリズムは，複数の非最適性を有している．そこで，本研究課題では間接的フィードバックのもとでのオンライン学習において，それらの課題を解決したアルゴリズムの設計と理論解析を行う．