2020 Fiscal Year Research-status Report

Zero-shot Cross-modal Embedding Learning

Research Project

Project/Area Number	19K11987
Research Institution	National Institute of Informatics
Principal Investigator	ュイ国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)
Project Period (FY)	2019-04-01 – 2022-03-31
Keywords	Cross-Modal Correlation / Cross-Modal Embedding
Outline of Annual Research Achievements	The main challenge of representation learning across different modalities is the heterogeneous gap. A classical method series is the CCA-based approaches, which aims at finding transformation to optimize the correlation between the input pairs from two different variable sets. We propose an unsupervised generative adversarial alignment representation (UGAAR) model to learn deep discriminative representations shared across three major musical modalities: sheet music, lyrics, and audio, where a deep neural network based architecture on three branches is jointly trained. In particular, the proposed model can transfer the strong relationship between audio and sheet music to audio-lyrics and sheet-lyrics pairs by learning the correlation in the latent shared subspace. We apply CCA components of audio and sheet music to establish new ground truth. The generative (G) model learns the correlation of two couples of transferred pairs to generate new audio-sheet pair for a fixed lyrics to challenge the discriminative (D) model. The discriminative model aims at distinguishing the input which is from the generative model or the ground truth. The two models simultaneously train in an adversarial way to enhance the ability of deep alignment representation learning. Our experimental results demonstrate the feasibility of our proposed UGAAR for alignment representation learning among sheet music, audio, and lyrics.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason The current research progress is going well. Some significant results have been published in international conference. The correlation among sheet music, audio, and lyrics has been learned. some baseline methods have been investigated.
Strategy for Future Research Activity	Future work will aim to develop novel cross-modal learning algorithms from the following aspects: (i) develop transformer techniques to learning more stronger correlation, (ii) develop multimodal metric learning to enhance system performance, (iii) do extensive experiments to compare with other existing state-of-the-art methods.
Causes of Carryover	コロナウィルスの影響で、海外に開催される国際会議に参加できなかったため次年度、会議参加費や、論文掲載費に使用する予定

Research Products
(2 results)

All 2020

All Presentation (2 results) (of which Int'l Joint Research: 1 results)

[Presentation] Unsupervised generative adversarial alignment representation for sheet music, audio and lyrics2020
- Author(s)
  Donghuo Zeng, Yi Yu, and Keizo Oyama
- Organizer
  IEEE International Conference on Multimedia Big Data 2020
- Int'l Joint Research
[Presentation] MusicTM-Dataset for joint representation learning among sheet music, lyrics, and musical audio2020
- Author(s)
  Donghuo Zeng, Yi Yu, and Keizo Oyama
- Organizer
  The 8th Conference on Sound and Music Technology, Lecture Notes in Electrical Engineering

2020 Fiscal Year Research-status Report

Zero-shot Cross-modal Embedding Learning

Principal Investigator

ュ イ 国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)

Current Status of Research Progress

Reason

Research Products

[Presentation] Unsupervised generative adversarial alignment representation for sheet music, audio and lyrics2020

Author(s)

Organizer

[Presentation] MusicTM-Dataset for joint representation learning among sheet music, lyrics, and musical audio2020

Author(s)

Organizer

ュイ国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)