2023 Fiscal Year Final Research Report

Vision and language cross-modal for training conditional GANs with long-tail data.

Research Project

PDF

Project/Area Number	22K17947
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	The University of Tokyo
Principal Investigator	VO MinhDuc 東京大学, 大学院情報理工学系研究科, 特任助教 (40939906)
Project Period (FY)	2022-04-01 – 2024-03-31
Keywords	Vision and language / Novel object captioning / GANs / External knowledge
Outline of Final Research Achievements	This study gains the knowledge about cross-modality between vision and language spaces. We built the knowledge base containing object's visual appearance and corresponding language description. We illustrated the efficacy of the collected knowledge base in enhancing the ability of describing unseen objects and predicting the future. We also explored new training paradigms of training generative adversarial networks under limited and open-set dataset as well as GAN inversion. This illustrated the ability of training a generative model when we cannot always harvest enough data to train a generative AI.
Free Research Field	Computer vision
Academic Significance and Societal Importance of the Research Achievements	We shows the efficacy of external knowledge base, helping AI in understanding up-to-date object knowledge and being able to predict the future given a sequence of sparsely temporally-ordered images. We showed the ability of generative AI when it is trained using limited number of training data.