• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Audio-Visual Music Understanding Based on Integration of Recognition and Generative Processes

Research Project

Project/Area Number 19H04137
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionKyoto University

Principal Investigator

Yoshii Kazuyoshi  京都大学, 情報学研究科, 准教授 (20510001)

Co-Investigator(Kenkyū-buntansha) 河原 達也  京都大学, 情報学研究科, 教授 (00234104)
森島 繁生  早稲田大学, 理工学術院, 教授 (10200411)
Project Period (FY) 2019-04-01 – 2023-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥17,160,000 (Direct Cost: ¥13,200,000、Indirect Cost: ¥3,960,000)
Fiscal Year 2022: ¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2021: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2020: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Fiscal Year 2019: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Keywords自動採譜 / 自動編曲 / 姿勢推定 / 確率的生成モデル / 深層学習 / 償却型変分推論 / 音楽情報処理 / 信号処理 / 記号処理 / ベイズ学習 / 音響信号処理 / 音声処理 / 画像処理
Outline of Research at the Start

音楽理解の核心は、音楽/身体感覚の獲得にあると考える。人間は、特別な訓練なしに、実体験から、音楽はこういうものである、身体はこのように動かせるといった、明文化が困難な内的感覚を獲得している。このおかげで、音楽的に破綻のない楽譜を書き起こせるし、ダンス映像から3D姿勢を想像できる。この仕組みを計算機上で実現し、音響/映像データに関する各種認識/生成タスクの性能限界の突破に挑むとともに、人間の音楽理解の構成論的解明に取り組む。

Outline of Final Research Achievements

As for auditory understanding, we have developed a unified approach based on statistical inference of probabilistic generative models to various important subtasks of automatic music transcription including singing voice transcription, music structure analysis, chord and key estimation, and drum transcription. We showed that the generative and inference models can be integrated in the VAE framework. As for visual understanding, we have developed a pose estimation method based on the same approach.

Academic Significance and Societal Importance of the Research Achievements

人間が視聴覚を通じて音楽を理解する機構に対して、表裏一体の関係にある生成過程と推論過程を統合した計算モデルを提示することができた。このモデルは、認知科学分野で知られていたミラーニューロン仮説に着想を得ており、統計的機械学習の見地からは、変分自己符号化器 (VAE) として定式化できることを示した。自動採譜のいくつかの課題や姿勢推定でこのモデルの有効性を示した。

Report

(5 results)
  • 2022 Annual Research Report   Final Research Report ( PDF )
  • 2021 Annual Research Report
  • 2020 Annual Research Report
  • 2019 Annual Research Report
  • Research Products

    (46 results)

All 2022 2021 2020 2019

All Journal Article (14 results) (of which Int'l Joint Research: 4 results,  Peer Reviewed: 14 results,  Open Access: 6 results) Presentation (32 results) (of which Int'l Joint Research: 32 results)

  • [Journal Article] Joint Chord and Key Estimation Based on a Hierarchical Variational Autoencoder with Multi-task Learning2022

    • Author(s)
      Wu Yiming, Yoshii Kazuyoshi
    • Journal Title

      APSIPA Transactions on Signal and Information Processing

      Volume: 11 Issue: 1 Pages: 1-27

    • DOI

      10.1561/116.00000052

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation2022

    • Author(s)
      Sekiguchi Kouhei, Bando Yoshiaki, Nugraha Aditya Arie, Fontaine Mathieu, Yoshii Kazuyoshi, Kawahara Tatsuya
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 30 Pages: 2368-2382

    • DOI

      10.1109/taslp.2022.3190734

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation2022

    • Author(s)
      Fontaine Mathieu, Sekiguchi Kouhei, Nugraha Aditya Arie, Bando Yoshiaki, Yoshii Kazuyoshi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 30 Pages: 1734-1748

    • DOI

      10.1109/taslp.2022.3172631

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms2021

    • Author(s)
      Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii
    • Journal Title

      Signals

      Volume: 2 Issue: 3 Pages: 508-526

    • DOI

      10.3390/signals2030031

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] MirrorNet: A Deep Reflective Approach to 2D Pose Estimation for Single-Person Images2021

    • Author(s)
      Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima
    • Journal Title

      Journal of Information Processing

      Volume: 29 Issue: 0 Pages: 406-423

    • DOI

      10.2197/ipsjjip.29.406

    • NAID

      130008038621

    • ISSN
      1882-6652
    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Musical Rhythm Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions2021

    • Author(s)
      Eita Nakamura, Kazuyoshi Yoshii
    • Journal Title

      Information Sciences

      Volume: 572 Pages: 482-500

    • DOI

      10.1016/j.ins.2021.04.100

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Audio-to-Score Singing Transcription Based on a CRNN-HSMM Hybrid Model2021

    • Author(s)
      Ryo Nishikimi, Eita Nakamura, Masataka Goto, Kazuyoshi Yoshii
    • Journal Title

      APSIPA Transactions on Signal and Information Processing

      Volume: 10 Issue: 1 Pages: 1-13

    • DOI

      10.1017/atsip.2021.4

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription2021

    • Author(s)
      Kentaro Shibata, Eita Nakamura, Kazuyoshi Yoshii
    • Journal Title

      Information Sciences

      Volume: 566 Pages: 262-280

    • DOI

      10.1016/j.ins.2021.03.014

    • Related Report
      2021 Annual Research Report 2020 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Bayesian Melody Harmonization Based on a Tree-Structured Generative Model of Chord Sequences and Melodies2020

    • Author(s)
      Hiroaki Tsushima, Eita Nakamura, Kazuyoshi Yoshii
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 1644-1655

    • DOI

      10.1109/taslp.2020.2996088

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories2020

    • Author(s)
      Nishikimi Ryo、Nakamura Eita、Goto Masataka、Itoyama Katsutoshi、Yoshii Kazuyoshi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 1678-1691

    • DOI

      10.1109/taslp.2020.2996095

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation2020

    • Author(s)
      Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 2610-2625

    • DOI

      10.1109/taslp.2020.3019181

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Semi-Supervised Neural Chord Estimation Based on a Variational Autoencoder With Latent Chord Labels and Features2020

    • Author(s)
      Yiming Wu, Tristan Carsault, Eita Nakamura, Kazuyoshi Yoshii
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 2956-2966

    • DOI

      10.1109/taslp.2020.3035001

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Flow-Based Independent Vector Analysis for Blind Source Separation2020

    • Author(s)
      Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii
    • Journal Title

      IEEE Signal Processing Letters

      Volume: 27 Pages: 2173-2177

    • DOI

      10.1109/lsp.2020.3039944

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Statistical learning and estimation of piano fingering2020

    • Author(s)
      Eita Nakamura, Yasuyuki Saito, Kazuyoshi Yoshii
    • Journal Title

      Information Sciences

      Volume: 517 Pages: 68-85

    • DOI

      10.1016/j.ins.2019.12.068

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed
  • [Presentation] End-to-End Lyrics Transcription Informed by Pitch and Onset Estimation2022

    • Author(s)
      Tengyu Deng, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Tracking the Evolution of a Band's Performances over Decades2022

    • Author(s)
      Florian Thalmann, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Difficulty-Aware Neural Band-to-Piano Score Arrangement Based on Note- and Statistic-Level Criteria2022

    • Author(s)
      Moyu Terao, Yuki Hiramatsu, Ryoto Ishizuka, Yiming Wu, and Kazuyoshi Yoshii
    • Organizer
      IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments2022

    • Author(s)
      Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii
    • Organizer
      IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments2022

    • Author(s)
      Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii
    • Organizer
      Annual Conference of the International Speech Communication Association (Interspeech)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF2022

    • Author(s)
      Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii
    • Organizer
      IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Localization and Synchronization of Distributed Camera-Attached Microphone Arrays for Indoor Scene Analysis2022

    • Author(s)
      Yoshiaki Sumura, Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii
    • Organizer
      IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Elliptically Contoured Alpha-Stable Representation for MUSIC-Based Sound Source Localization2022

    • Author(s)
      Mathieu Fontaine, Diego Di Carlo, Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii
    • Organizer
      European Signal Processing Conference (EUSIPCO)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation2022

    • Author(s)
      Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, and Kazuyoshi Yoshii
    • Organizer
      IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Joint Estimation of Note Values and Voices for Audio-to-Score Piano Transcription2021

    • Author(s)
      Yuki Hiramatsu, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Phase-Aware Joint Beat and Downbeat Estimation Based on Periodicity of Metrical Structure2021

    • Author(s)
      Takehisa Oyama, Ryoto Ishizuka, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Statistical Correction of Transcribed Melody Notes Based on Probabilistic Integration of a Music Language Model and a Transcription Error Model2021

    • Author(s)
      Yuki Hiramatsu, Go Shibata, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VEA-Based Metric Learning2021

    • Author(s)
      Keitaro Tanaka, Ryo Nishikimi, Yoshiaki Bando, Kazuyoshi Yoshii, Shigeo Morishima
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A Real-Time Drum-Wise Volume Visualization System for Learning Volume-Balanced Drum Performance2021

    • Author(s)
      Mitsuki Hosoya, Masanori Morise, Satoshi Nakamura, Kazuyoshi Yoshii
    • Organizer
      International Conference on Entertainment Computing (ICEC)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction2020

    • Author(s)
      Andrew McLeod, James Owers, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A Method for Analysis of Shared Structure in Large Music Collections Using Techniques from Genetic Sequencing and Graph Theory2020

    • Author(s)
      Florian Thalmann, Kazuyoshi Yoshii, Wiggins Geraint, Mark B. Sandler
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Multi-Instrument Music Transcription Based on Deep Spherical Clustering of Spectrograms and Pitchgrams2020

    • Author(s)
      Keitaro Tanaka, Takayuki Nakatsuka, Ryo Nishikimi, Kazuyoshi Yoshii, Shigeo Morishima
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Music Structure Analysis Based on an LSTM-HSMM Hybrid Model2020

    • Author(s)
      Go Shibata, Ryo Nishikimi, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization2020

    • Author(s)
      Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Kazuyoshi Yoshii
    • Organizer
      Annual Conference of the International Speech Communication Association (Interspeech)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder2020

    • Author(s)
      Yoshiaki Bando, Kouhei Sekiguchi and Kazuyoshi Yoshii
    • Organizer
      Annual Conference of the International Speech Communication Association (Interspeech)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] End-to-End Music-Mixed Speech Recognition2020

    • Author(s)
      Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara
    • Organizer
      Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A Variational Autoencoder for Joint Chord and Key Estimation from Audio Chromagrams2020

    • Author(s)
      Yiming Wu, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training2020

    • Author(s)
      yoto Ishizuka, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Semi-supervised Multichannel Speech Separation Based on a Phone- and Speaker-Aware Deep Generative Model of Speech Spectrograms2020

    • Author(s)
      Yicheng Du, Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii, Tatsuya Kawahara
    • Organizer
      European Signal Processing Conference (EUSIPCO)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Fast Multichannel Correlated Tensor Factorization for Blind Source Separation2020

    • Author(s)
      Kazuyoshi Yoshii, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Aditya Arie Nugraha
    • Organizer
      European Signal Processing Conference (EUSIPCO)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Audio-Guided Video Interpolation via Human Pose Features2020

    • Author(s)
      Takayuki Nakatsuka, Masatoshi Hamanaka, Shigeo Morishima
    • Organizer
      International Conference on Computer Vision Theory and Applications (VISAPP)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Statistical Music Structure Analysis Based on a Homogeneity-, Repetitiveness-, and Regularity-Aware Hierarchical Hidden Semi-Markov Model2019

    • Author(s)
      Go Shibata, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Blending Acoustic and Language Model Predictions for Automatic Music Transcription2019

    • Author(s)
      Adrien Ycart, Andrew McLeod, Emmanouil Benetos, Kazuyoshi Yoshii
    • Organizer
      International Society for Music Information Retrieval Conference (ISMIR)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] End-to-End Melody Note Transcription Based on a Beat-Synchronous Attention Mechanism2019

    • Author(s)
      Ryo Nishikimi, Eita Nakamura, Masataka Goto, Kazuyoshi Yoshii
    • Organizer
      IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Joint Singing Pitch Estimation and Voice Separation Based on a Neural Harmonic Structure Renderer2019

    • Author(s)
      Tomoyasu Nakano, Kazuyoshi Yoshii, Yiming Wu, Ryo Nishikimi, Kin Wah Edward Lin, Masataka Goto
    • Organizer
      IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Networks2019

    • Author(s)
      Tristan Carsault, Andrew McLeod, Philippe Esling, Jerome Nika, Eita Nakamura, Kazuyoshi Yoshii
    • Organizer
      IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations2019

    • Author(s)
      Yiming Wu, Tristan Carsault, Kazuyoshi Yoshii. Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations
    • Organizer
      European Signal Processing Conference (EUSIPCO)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research

URL: 

Published: 2019-04-18   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi