Project/Area Number |
22K17947
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | The University of Tokyo |
Principal Investigator |
VO MinhDuc 東京大学, 大学院情報理工学系研究科, 特任助教 (40939906)
|
Project Period (FY) |
2022-04-01 – 2024-03-31
|
Project Status |
Completed (Fiscal Year 2023)
|
Budget Amount *help |
¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Fiscal Year 2023: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | Vision and language / Novel object captioning / GANs / External knowledge / Bias mitigation / Story evaluation / Dataset / Conditional GANs / Long-tail data |
Outline of Research at the Start |
1) Creating a dataset for our study because existing datasets are insufficient. 2) Constructing vision-language cross-modal by learning cross-modal similarity. 3) Learning data augmentation using vision-language cross-modal. 4) Incorporating the vision-language cross-modal into the conditional GANs.
|
Outline of Final Research Achievements |
This study gains the knowledge about cross-modality between vision and language spaces. We built the knowledge base containing object's visual appearance and corresponding language description. We illustrated the efficacy of the collected knowledge base in enhancing the ability of describing unseen objects and predicting the future. We also explored new training paradigms of training generative adversarial networks under limited and open-set dataset as well as GAN inversion. This illustrated the ability of training a generative model when we cannot always harvest enough data to train a generative AI.
|
Academic Significance and Societal Importance of the Research Achievements |
We shows the efficacy of external knowledge base, helping AI in understanding up-to-date object knowledge and being able to predict the future given a sequence of sparsely temporally-ordered images. We showed the ability of generative AI when it is trained using limited number of training data.
|