Vision and language cross-modal for training conditional GANs with long-tail data.
Project/Area Number |
22K17947
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | The University of Tokyo |
Principal Investigator |
ヴォ ミンデュク 東京大学, 大学院情報理工学系研究科, 特任助教 (40939906)
|
Project Period (FY) |
2022-04-01 – 2024-03-31
|
Project Status |
Completed (Fiscal Year 2023)
|
Budget Amount *help |
¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Fiscal Year 2023: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | Vision and language / Novel object captioning / GANs / External knowledge / Bias mitigation / Story evaluation / Dataset / Conditional GANs / Long-tail data |
Outline of Research at the Start |
1) Creating a dataset for our study because existing datasets are insufficient. 2) Constructing vision-language cross-modal by learning cross-modal similarity. 3) Learning data augmentation using vision-language cross-modal. 4) Incorporating the vision-language cross-modal into the conditional GANs.
|
Outline of Annual Research Achievements |
We expand our knowledge of the cross-modality between vision and language spaces. We obtained four achievements: 1. By using commonsense knowledge, we can anticipate the future, given a set of sparsely temporally-ordered set of images. It was published at CVPR 2023. 2. We explore training GANs under limited and open-set dataset as well as GAN inversion. The three papers were published at WACV 2024. 3. We build a new knowledge containing image features and corresponding object names. Using it, we propose a method for novel object captioning that outperforms other methods while being comparable to LLMs. It will be published at CVPR 2024. 4. We also gain knowledge about bias mitigation in image classification using a mixture of biases-specific experts. It was published at ICCV 2023.
|
Report
(2 results)
Research Products
(18 results)