2023 Fiscal Year Final Research Report
Vision and language cross-modal for training conditional GANs with long-tail data.
Project/Area Number |
22K17947
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | The University of Tokyo |
Principal Investigator |
VO MinhDuc 東京大学, 大学院情報理工学系研究科, 特任助教 (40939906)
|
Project Period (FY) |
2022-04-01 – 2024-03-31
|
Keywords | Vision and language / Novel object captioning / GANs / External knowledge |
Outline of Final Research Achievements |
This study gains the knowledge about cross-modality between vision and language spaces. We built the knowledge base containing object's visual appearance and corresponding language description. We illustrated the efficacy of the collected knowledge base in enhancing the ability of describing unseen objects and predicting the future. We also explored new training paradigms of training generative adversarial networks under limited and open-set dataset as well as GAN inversion. This illustrated the ability of training a generative model when we cannot always harvest enough data to train a generative AI.
|
Free Research Field |
Computer vision
|
Academic Significance and Societal Importance of the Research Achievements |
We shows the efficacy of external knowledge base, helping AI in understanding up-to-date object knowledge and being able to predict the future given a sequence of sparsely temporally-ordered images. We showed the ability of generative AI when it is trained using limited number of training data.
|