2020 Fiscal Year Research-status Report
Zero-shot Cross-modal Embedding Learning
Project/Area Number |
19K11987
|
Research Institution | National Institute of Informatics |
Principal Investigator |
ュ イ 国立情報学研究所, コンテンツ科学研究系, 特任助教 (00754681)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Keywords | Cross-Modal Correlation / Cross-Modal Embedding |
Outline of Annual Research Achievements |
The main challenge of representation learning across different modalities is the heterogeneous gap. A classical method series is the CCA-based approaches, which aims at finding transformation to optimize the correlation between the input pairs from two different variable sets. We propose an unsupervised generative adversarial alignment representation (UGAAR) model to learn deep discriminative representations shared across three major musical modalities: sheet music, lyrics, and audio, where a deep neural network based architecture on three branches is jointly trained. In particular, the proposed model can transfer the strong relationship between audio and sheet music to audio-lyrics and sheet-lyrics pairs by learning the correlation in the latent shared subspace. We apply CCA components of audio and sheet music to establish new ground truth. The generative (G) model learns the correlation of two couples of transferred pairs to generate new audio-sheet pair for a fixed lyrics to challenge the discriminative (D) model. The discriminative model aims at distinguishing the input which is from the generative model or the ground truth. The two models simultaneously train in an adversarial way to enhance the ability of deep alignment representation learning. Our experimental results demonstrate the feasibility of our proposed UGAAR for alignment representation learning among sheet music, audio, and lyrics.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The current research progress is going well. Some significant results have been published in international conference. The correlation among sheet music, audio, and lyrics has been learned. some baseline methods have been investigated.
|
Strategy for Future Research Activity |
Future work will aim to develop novel cross-modal learning algorithms from the following aspects: (i) develop transformer techniques to learning more stronger correlation, (ii) develop multimodal metric learning to enhance system performance, (iii) do extensive experiments to compare with other existing state-of-the-art methods.
|
Causes of Carryover |
コロナウィルスの影響で、海外に開催される国際会議に参加できなかったため 次年度、会議参加費や、論文掲載費に使用する予定
|