Unifying Object Detection and Image Captioning using Vision-Language Knowledge Base for Open-World Comprehension

研究課題

研究課題/領域番号	24K20830
研究種目	若手研究
配分区分	基金
審査区分	小区分61030:知能情報学関連
研究機関	東京大学
研究代表者	ヴォミンデュク東京大学, 大学院情報理工学系研究科, 特任助教 (40939906)
研究期間 (年度)	2024-04-01 – 2026-03-31
研究課題ステータス	交付 (2024年度)
配分額 *注記	4,680千円 (直接経費: 3,600千円、間接経費: 1,080千円) 2025年度: 1,820千円 (直接経費: 1,400千円、間接経費: 420千円) 2024年度: 2,860千円 (直接経費: 2,200千円、間接経費: 660千円)
キーワード	vision - language / image captioning / object recognition
研究開始時の研究の概要	Object detection and image captioning tasks are connected, but each has the potential to recognize and depict objects that are beyond the scope of the other. This research investigates a more comprehensive and cohesive understanding of visual content by unifying both tasks in the context of generative task. We aim to develop a vision - language knowledge base method that not only detects and describes the objects in the training dataset, but also on novel objects not seen during training.