• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Zero-shot Cross-modal Embedding Learning

Research Project

Project/Area Number 19K11987
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 60080:Database-related
Research InstitutionNational Institute of Informatics

Principal Investigator

Yu Yi  国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)

Project Period (FY) 2019-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2021: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2019: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
KeywordsCross-Modal Correlation / Cross-Modal Embedding / cross-modal embedding / zero-shot / cross-modal retrieval
Outline of Research at the Start

Lots of efforts have been devoted to learning cross-modal correlation between data in different modalities. But existing cross-modal embedding models usually do not work well when the query or database include new data with unknown categories. To solve this problem, this project aims to develop zero-shot cross-modal embedding learning algorithms from the following aspects: (i) compute modality-invariant embedding, (ii) predict unknown categories based on external knowledge describing their correlation from known categories, (iii) apply adversarial learning to enhance system performance.

Outline of Final Research Achievements

This project focused on cross-modal embedding learning for cross-modal retrieval. The main challenge is how to learn joint embeddings in a shared subspace for computing the similarity across different modalities. 1) We proposed a novel deep triplet neural network with cluster canonical correlation analysis (TNN-C-CCA), which is an end-to-end supervised learning architecture with audio branch and video branch. 2) We proposed a novel variational autoencoder (VAE) architecture for audio-visual cross-modal retrieval, by learning paired audio-visual correlation embedding and category correlation embedding as constraints to reinforce the mutuality of audio-visual information. 3) We proposed an unsupervised generative adversarial alignment representation (UGAAR) model to learn deep discriminative representations shared across three major musical modalities: sheet music, lyrics, and audio, where a deep neural network based architecture on three branches is jointly trained.

Academic Significance and Societal Importance of the Research Achievements

The distribution of data in different modalities are inconsistent, which makes it difficult to directly measure the similarity across different modalities. The proposed technique of cross-modal embedding learning can help improve the performance of cross-modal retrieval, recognition, and generation.

Report

(4 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • 2019 Research-status Report
  • Research Products

    (5 results)

All 2022 2020 Other

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (3 results) (of which Int'l Joint Research: 2 results) Remarks (1 results)

  • [Journal Article] Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-Modal Retrieval2020

    • Author(s)
      Donghuo Zeng, Yi Yu, Keizo Oyama
    • Journal Title

      ACM Transaction on Multimedia Computing Communication and Applications

      Volume: 16 Pages: 1-23

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Presentation] Melody generation from lyrics using three branch conditional LSTM-GAN2022

    • Author(s)
      Abhishek Srivastava, Wei Duan, Rajiv Ratn Shah, Jianming Wu, Suhua Tang, Wei Li, and Yi Yu
    • Organizer
      28th International Conference on Multimedia Modeling (MMM)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Unsupervised generative adversarial alignment representation for sheet music, audio and lyrics2020

    • Author(s)
      Donghuo Zeng, Yi Yu, and Keizo Oyama
    • Organizer
      IEEE International Conference on Multimedia Big Data 2020
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] MusicTM-Dataset for joint representation learning among sheet music, lyrics, and musical audio2020

    • Author(s)
      Donghuo Zeng, Yi Yu, and Keizo Oyama
    • Organizer
      The 8th Conference on Sound and Music Technology, Lecture Notes in Electrical Engineering
    • Related Report
      2020 Research-status Report
  • [Remarks]

    • URL

      https://github.com/yy1lab/Lyrics-Conditioned-Neural-Melody-Generation

    • Related Report
      2019 Research-status Report

URL: 

Published: 2019-04-18   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi