• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Vision and language cross-modal for training conditional GANs with long-tail data.

Research Project

Project/Area Number 22K17947
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61030:Intelligent informatics-related
Research InstitutionThe University of Tokyo

Principal Investigator

VO MinhDuc  東京大学, 大学院情報理工学系研究科, 特任助教 (40939906)

Project Period (FY) 2022-04-01 – 2024-03-31
Project Status Completed (Fiscal Year 2023)
Budget Amount *help
¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Fiscal Year 2023: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
KeywordsVision and language / Novel object captioning / GANs / External knowledge / Bias mitigation / Story evaluation / Dataset / Conditional GANs / Long-tail data
Outline of Research at the Start

1) Creating a dataset for our study because existing datasets are insufficient.
2) Constructing vision-language cross-modal by learning cross-modal similarity.
3) Learning data augmentation using vision-language cross-modal.
4) Incorporating the vision-language cross-modal into the conditional GANs.

Outline of Final Research Achievements

This study gains the knowledge about cross-modality between vision and language spaces. We built the knowledge base containing object's visual appearance and corresponding language description. We illustrated the efficacy of the collected knowledge base in enhancing the ability of describing unseen objects and predicting the future.
We also explored new training paradigms of training generative adversarial networks under limited and open-set dataset as well as GAN inversion. This illustrated the ability of training a generative model when we cannot always harvest enough data to train a generative AI.

Academic Significance and Societal Importance of the Research Achievements

We shows the efficacy of external knowledge base, helping AI in understanding up-to-date object knowledge and being able to predict the future given a sequence of sparsely temporally-ordered images. We showed the ability of generative AI when it is trained using limited number of training data.

Report

(3 results)
  • 2023 Annual Research Report   Final Research Report ( PDF )
  • 2022 Research-status Report
  • Research Products

    (18 results)

All 2024 2023 2022

All Journal Article (10 results) (of which Int'l Joint Research: 10 results,  Peer Reviewed: 10 results,  Open Access: 9 results) Presentation (8 results) (of which Int'l Joint Research: 8 results)

  • [Journal Article] Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Harada Tatsuya、Nakayama Hideki
    • Journal Title

      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

      Volume: - Pages: 5311-5320

    • DOI

      10.1109/wacv57701.2024.00524

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Revisiting Latent Space of GAN Inversion for Robust Real Image Editing2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Liu Bei、Nakayama Hideki
    • Journal Title

      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

      Volume: - Pages: 5301-5310

    • DOI

      10.1109/wacv57701.2024.00523

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Label Augmentation as Inter-class Data Augmentation for Conditional Image Synthesis with Imbalanced Data2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Nakayama Hideki
    • Journal Title

      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

      Volume: - Pages: 4932-4941

    • DOI

      10.1109/wacv57701.2024.00487

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] EVCap: Retrieval-Augmented Image Captioning with External Visual--Name Memory for Open-World Comprehension2024

    • Author(s)
      Li Jiaxuan、Vo Duc Minh、Sugimoto Akihiro, Nakayama Hideki
    • Journal Title

      2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

      Volume: 1

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts2023

    • Author(s)
      Li Jiaxuan、Vo Duc Minh、Nakayama Hideki
    • Journal Title

      2023 IEEE/CVF International Conference on Computer Vision (ICCV)

      Volume: 1 Pages: 4901-4911

    • DOI

      10.1109/iccv51070.2023.00454

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] A-CAP: Anticipation Captioning with Commonsense Knowledge2023

    • Author(s)
      Vo Duc Minh、Luong Quoc-An、Sugimoto Akihiro、Nakayama Hideki
    • Journal Title

      2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

      Volume: 1 Pages: 10824-10833

    • DOI

      10.1109/cvpr52729.2023.01042

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023

    • Author(s)
      Rui Yang, Duc Minh Vo, Hideki Nakayama
    • Journal Title

      Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

      Volume: - Pages: 4641-4650

    • DOI

      10.1109/wacv56688.2023.00463

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022

    • Author(s)
      Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama
    • Journal Title

      2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

      Volume: - Pages: 17979-17987

    • DOI

      10.1109/cvpr52688.2022.01747

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Stochastically Flipping Labels of Discriminator’s Outputs for Training Generative Adversarial Networks2022

    • Author(s)
      Rui Yang, Duc Minh Vo, Hideki Nakayama
    • Journal Title

      IEEE Access

      Volume: 10 Pages: 103644-103654

    • DOI

      10.1109/access.2022.3210130

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning2022

    • Author(s)
      Hong Chen, Duc Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama
    • Journal Title

      Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

      Volume: - Pages: 1739-1753

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Harada Tatsuya、Nakayama Hideki
    • Organizer
      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Revisiting Latent Space of GAN Inversion for Robust Real Image Editing2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Liu Bei、Nakayama Hideki
    • Organizer
      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Label Augmentation as Inter-class Data Augmentation for Conditional Image Synthesis with Imbalanced Data2024

    • Author(s)
      Katsumata Kai、Vo Duc Minh、Nakayama Hideki
    • Organizer
      2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts2023

    • Author(s)
      Li Jiaxuan、Vo Duc Minh、Nakayama Hideki
    • Organizer
      2023 IEEE/CVF International Conference on Computer Vision (ICCV)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A-CAP: Anticipation Captioning with Commonsense Knowledge2023

    • Author(s)
      Vo Duc Minh、Luong Quoc-An、Sugimoto Akihiro、Nakayama Hideki
    • Organizer
      2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Indirect Adversarial Losses via an Intermediate Distribution for Training GANs2023

    • Author(s)
      Yang Rui、Vo Duc Minh、Nakayama Hideki
    • Organizer
      2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge2022

    • Author(s)
      Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama
    • Organizer
      2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning2022

    • Author(s)
      Hong Chen, Duc Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama
    • Organizer
      2022 Conference on Empirical Methods in Natural Language Processing
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research

URL: 

Published: 2022-04-19   Modified: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi