2020 年度実施状況報告書

Zero-shot Cross-modal Embedding Learning

研究課題

研究課題/領域番号	19K11987
研究機関	国立情報学研究所
研究代表者	ュイ国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)
研究期間 (年度)	2019-04-01 – 2022-03-31
キーワード	Cross-Modal Correlation / Cross-Modal Embedding
研究実績の概要	The main challenge of representation learning across different modalities is the heterogeneous gap. A classical method series is the CCA-based approaches, which aims at finding transformation to optimize the correlation between the input pairs from two different variable sets. We propose an unsupervised generative adversarial alignment representation (UGAAR) model to learn deep discriminative representations shared across three major musical modalities: sheet music, lyrics, and audio, where a deep neural network based architecture on three branches is jointly trained. In particular, the proposed model can transfer the strong relationship between audio and sheet music to audio-lyrics and sheet-lyrics pairs by learning the correlation in the latent shared subspace. We apply CCA components of audio and sheet music to establish new ground truth. The generative (G) model learns the correlation of two couples of transferred pairs to generate new audio-sheet pair for a fixed lyrics to challenge the discriminative (D) model. The discriminative model aims at distinguishing the input which is from the generative model or the ground truth. The two models simultaneously train in an adversarial way to enhance the ability of deep alignment representation learning. Our experimental results demonstrate the feasibility of our proposed UGAAR for alignment representation learning among sheet music, audio, and lyrics.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 The current research progress is going well. Some significant results have been published in international conference. The correlation among sheet music, audio, and lyrics has been learned. some baseline methods have been investigated.
今後の研究の推進方策	Future work will aim to develop novel cross-modal learning algorithms from the following aspects: (i) develop transformer techniques to learning more stronger correlation, (ii) develop multimodal metric learning to enhance system performance, (iii) do extensive experiments to compare with other existing state-of-the-art methods.
次年度使用額が生じた理由	コロナウィルスの影響で、海外に開催される国際会議に参加できなかったため次年度、会議参加費や、論文掲載費に使用する予定

研究成果
(2件)

すべて 2020

すべて学会発表 (2件) (うち国際学会 1件)

[学会発表] Unsupervised generative adversarial alignment representation for sheet music, audio and lyrics2020
- 著者名/発表者名
  Donghuo Zeng, Yi Yu, and Keizo Oyama
- 学会等名
  IEEE International Conference on Multimedia Big Data 2020
- 国際学会
[学会発表] MusicTM-Dataset for joint representation learning among sheet music, lyrics, and musical audio2020
- 著者名/発表者名
  Donghuo Zeng, Yi Yu, and Keizo Oyama
- 学会等名
  The 8th Conference on Sound and Music Technology, Lecture Notes in Electrical Engineering

2020 年度 実施状況報告書

Zero-shot Cross-modal Embedding Learning

研究代表者

ュ イ 国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)

現在までの達成度 (区分)

理由

研究成果

[学会発表] Unsupervised generative adversarial alignment representation for sheet music, audio and lyrics2020

著者名/発表者名

学会等名

[学会発表] MusicTM-Dataset for joint representation learning among sheet music, lyrics, and musical audio2020

著者名/発表者名

学会等名

2020 年度実施状況報告書

ュイ国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)