深層ベイズ学習に基づく歌声の認識と生成の統一理論

Research Project

Project/Area Number	19J15255
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Kyoto University
Principal Investigator	錦見亮京都大学, 情報学研究科, 特別研究員(DC2)
Project Period (FY)	2019-04-25 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥2,100,000 (Direct Cost: ¥2,100,000) Fiscal Year 2020: ¥1,000,000 (Direct Cost: ¥1,000,000) Fiscal Year 2019: ¥1,100,000 (Direct Cost: ¥1,100,000)
Keywords	音楽情報処理 / 自動採譜 / 歌声採譜
Outline of Research at the Start	本研究では，人間が自身の個性を反映しながら「音楽を生成する過程」と「音楽を認識する過程」を統一的に記述するモデルを構成することで，表裏一体の関係にある音楽情報処理の認識タスク（個性を反映した歌声や楽曲の自動生成やスタイル変換）と生成タスク（音源分離や自動採譜，個性や歌唱表現の解析）とを双方の依存関係を考慮しながら同時に解決できる方法論を確立する．
Outline of Annual Research Achievements	本研究では，音楽音響信号から歌声が担う主旋律の楽譜を推定する歌声採譜技術を扱う．主旋律は多くの楽曲の印象に密接に関連しているため，歌声採譜は認識・生成の双方向歌声解析において重要な技術である．歌声の音高軌跡（F0軌跡）はビブラートやオーバシュート等の歌唱表現よって，楽譜に記述された音符の音高や発音時刻から大きく逸脱しているため，単純な方法では音楽的に不自然な音符列が推定されてしまう．また，従来法は事前推定したF0軌跡を時間・周波数方向に離散化して楽譜を推定しているが，事前推定による誤差伝播の問題や音符のオンセット情報が欠落したF0軌跡からは同音高の連続音符の境界が判定不能という問題があるため，音楽音響信号を直接扱える手法の構築が必要であった．そこで本研究では，深層ニューラルネットワークに基づく音響モデルと従来の統計モデルに基づく言語モデルを統合した音楽音響信号の生成モデルを開発した．提案モデルにおいて，言語モデルはセミマルコフモデル（semi-Markov model; SMM）で構成され，調に依存しながら音符系列が生成される過程を表現する．また，音響モデルは畳み込みリカレントニューラルネットワーク（convolutional recurrent neural network; CRNN）で構成され，音符に基づいて観測音楽音響信号が生成される過程を表現する．提案モデルは，言語モデルに基づく音符に関する文法的な知識とCRNN音響モデルの表現力の両方を活用しながら，ビタビアルゴリズムを用いて音楽信号から直接音符を推定する．実際の音楽音響信号と合成の歌声を用いた評価実験では，従来の歌声F0軌跡に対する歌声採譜手法よりも高い性能を達成した．また，音響モデルのみ用いて推定された楽譜よりも高い性能を達成したことから，言語モデルと音響モデルを統合することの有効性も確認した．
Research Progress Status	令和2年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和2年度が最終年度であるため、記入しない。

Report

(2 results)

2020 Annual Research Report
2019 Annual Research Report

Research Products
(28 results)

All 2021 2020 2019

All Journal Article (3 results) (of which Peer Reviewed: 3 results, Open Access: 1 results) Presentation (25 results) (of which Int'l Joint Research: 12 results)

[Journal Article] Audio-to-Score Singing Transcription Based on a CRNN-HSMM Hybrid Model2021
- Author(s)
  Ryo Nishikimi, Eita Nakamura, Masataka Goto, Kazuyoshi Yoshii
- Journal Title
  
  APSIPA Transactions on Signal and Information Processing
  
  Volume: 10 Issue: 1 Pages: 1-13
- DOI
  10.1017/atsip.2021.4
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Statistical Method for Music Structure Analysis Based on a Hierarchical HSMM2020
- Author(s)
  柴田剛, 錦見亮, 中村栄太, 吉井和佳
- Journal Title
  
  情報処理学会論文誌
  
  Volume: 61 Issue: 4 Pages: 757-767
- DOI
  10.20729/00204224
- NAID
  170000181816
- Year and Date
  2020-04-15
- Related Report
  2019 Annual Research Report
- Peer Reviewed
[Journal Article] Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories2020
- Author(s)
  Nishikimi Ryo、Nakamura Eita、Goto Masataka、Itoyama Katsutoshi、Yoshii Kazuyoshi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 28 Pages: 1678-1691
- DOI
  10.1109/taslp.2020.2996095
- Related Report
  2020 Annual Research Report
- Peer Reviewed
[Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VAE-Based Metric Learning2021
- Author(s)
  Keitaro Tanaka, Ryo Nishikimi, Yoshiaki Bando, Kazuyoshi Yoshii, Shigeo Morishima
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Statistical Correction of Transcribed Melody Notes Based on Probabilistic Integration of a Music Language Model and a Transcription Error Model2021
- Author(s)
  Yuki Hiramatsu, Go Shibata, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] ピアノ採譜のための深層学習に基づく音価と声部の同時推定2021
- Author(s)
  平松祐紀, 柴田剛, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第83回全国大会
- Related Report
  2020 Annual Research Report
[Presentation] 拍節構造の周期性に基づく深層ビート推定2021
- Author(s)
  大山偉永, 石塚崚斗, 錦見亮, 吉井和佳
- Organizer
  情報処理学会第83回全国大会
- Related Report
  2020 Annual Research Report
[Presentation] ポピュラー音楽に対する難易度に応じた深層ピアノ編曲2021
- Author(s)
  寺尾萌夢, 石塚崚斗, 錦見亮, 吉井和佳
- Organizer
  情報処理学会第83回全国大会
- Related Report
  2020 Annual Research Report
[Presentation] Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Mdoel-Based Regularized Training2020
- Author(s)
  Ryoto Ishizuka, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii
- Organizer
  2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Music Structure Analysis Based on an LSTM-HSMM Hybrid Model2020
- Author(s)
  Go Shibata, Ryo Nishikimi, Kazuyoshi Yoshii
- Organizer
  The 21th Annual Conference of the International Society for Music Information Retrieval
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Multi-Instrument Music Transcription Based on Deep Spherical Clustering of Spectrograms and Pitchgrams2020
- Author(s)
  Keitaro Tanaka, Takayuki Nakatsuka, Ryo Nishikimi, Kazuyoshi Yoshii, Shigeo Morishima
- Organizer
  The 21th Annual Conference of the International Society for Music Information Retrieval
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] 大局的構造に基づく正則化を用いた自己注意機構付き深層ドラム採譜2020
- Author(s)
  石塚崚斗, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第129回音楽情報科学研究会
- Related Report
  2020 Annual Research Report
[Presentation] 事前学習済み言語モデルによる正則化を用いた深層ドラム採譜2020
- Author(s)
  石塚崚斗, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第128回音楽情報科学研究会
- Related Report
  2020 Annual Research Report
[Presentation] LSTM-HSMMハイブリッドモデルに基づく音楽構造解析2020
- Author(s)
  柴田剛, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第128回音楽情報科学研究会
- Related Report
  2020 Annual Research Report
[Presentation] スペクトログラムとピッチグラムの深層クラスタリングに基づく複数楽器パート採譜2020
- Author(s)
  田中啓太郎, 中塚貴之, 錦見亮, 吉井和佳, 森島繁生
- Organizer
  情報処理学会第128回音楽情報科学研究会
- Related Report
  2020 Annual Research Report
[Presentation] 階層隠れセミマルコフモデルと深層学習に基づく楽曲セクションの境界推定とラベル付け2020
- Author(s)
  柴田剛, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第82回全国大会
- Related Report
  2019 Annual Research Report
[Presentation] 深層クラスタリングを用いた任意楽器パートの自動採譜2020
- Author(s)
  田中啓太郎, 中塚貴之, 錦見亮, 吉井和佳, 森島繁生
- Organizer
  情報処理学会第82回全国大会
- Related Report
  2019 Annual Research Report
[Presentation] 深層音響・言語モデルの統合に基づくドラム採譜2020
- Author(s)
  石塚崚斗, 上田瞬, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第82回全国大会
- Related Report
  2019 Annual Research Report
[Presentation] End-to-End Melody Note Transcription Based on a Beat-Synchronous Attention Mechanism2019
- Author(s)
  Ryo Nishikimi, Eita Nakamura, Masataka Goto, Kazuyoshi Yoshii
- Organizer
  IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] Automatic Singing Transcription Based on Encoder-Decoder Recurrent Neural Networks with a Weakly-Supervised Attention Mechanism2019
- Author(s)
  Ryo Nishikimi, Eita Nakamura, Satoru Fukayama, Masataka Goto, Kazuyoshi Yoshii
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] ビート同期注意機構に基づく歌声のリズム採譜2019
- Author(s)
  錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第124回音楽情報科学研究会
- Related Report
  2019 Annual Research Report
[Presentation] Joint Singing Pitch Estimation and Voice Separation Based on Neural Harmonic Structure Renderer2019
- Author(s)
  Tomoyasu Nakano, Kazuyoshi Yoshii, Yiming Wu, Ryo Nishikimi, Kin Wah Edward Lin, Masataka Goto
- Organizer
  IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] Statistical Music Structure Analysis Based on a Homogeneity-, and Repetitiveness-, and Regularity-Aware Hierarchical Hidden Semi-Markov Model2019
- Author(s)
  Go Shibata, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii
- Organizer
  The 20th Annual Conference of the International Society for Music Information Retrieval
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] Unsupervised Melody Style Conversion2019
- Author(s)
  Eita Nakamura, Kentaro Shibata, Ryo Nishikimi, Kazuyoshi Yoshii
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] Joint Transcription of Lead, Bass, and Rhythm Guitars Based on a Factorial Hidden Semi-Markov Model2019
- Author(s)
  Kentaro Shibata, Ryo Nishikimi, Satoru Fukayama, Masataka Goto, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] Bayesian Drum Transcription Based on Noonegative Matrix Factor Decomposition with a Deep Score Prior2019
- Author(s)
  Shun Ueda, Kentaro Shibata, Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] 音楽言語モデルと採譜誤りモデルに基づく歌声採譜結果の訂正2019
- Author(s)
  平松祐紀, 柴田剛, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第82回全国大会
- Related Report
  2019 Annual Research Report
[Presentation] 深層ドラム譜事前分布に基づく畳み込み非負値行列因子分解を用いたドラム採譜2019
- Author(s)
  上田瞬, 柴田健太郎, 和田雄介, 錦見亮, 中村栄太, 吉井和佳
- Organizer
  情報処理学会第122回音楽情報科学研究会
- Related Report
  2019 Annual Research Report

深層ベイズ学習に基づく歌声の認識と生成の統一理論

Principal Investigator

錦見 亮 京都大学, 情報学研究科, 特別研究員(DC2)

¥2,100,000 (Direct Cost: ¥2,100,000)

Report

Research Products

[Journal Article] Audio-to-Score Singing Transcription Based on a CRNN-HSMM Hybrid Model2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Statistical Method for Music Structure Analysis Based on a Hierarchical HSMM2020

Author(s)

Journal Title

DOI

NAID

Year and Date

Related Report

[Journal Article] Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VAE-Based Metric Learning2021

Author(s)

Organizer

Related Report

[Presentation] Statistical Correction of Transcribed Melody Notes Based on Probabilistic Integration of a Music Language Model and a Transcription Error Model2021

Author(s)

Organizer

Related Report

[Presentation] ピアノ採譜のための深層学習に基づく音価と声部の同時推定2021

Author(s)

Organizer

Related Report

[Presentation] 拍節構造の周期性に基づく深層ビート推定2021

Author(s)

Organizer

Related Report

[Presentation] ポピュラー音楽に対する難易度に応じた深層ピアノ編曲2021

Author(s)

Organizer

Related Report

[Presentation] Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Mdoel-Based Regularized Training2020

Author(s)

Organizer

Related Report

[Presentation] Music Structure Analysis Based on an LSTM-HSMM Hybrid Model2020

Author(s)

Organizer

Related Report

[Presentation] Multi-Instrument Music Transcription Based on Deep Spherical Clustering of Spectrograms and Pitchgrams2020

Author(s)

Organizer

Related Report

[Presentation] 大局的構造に基づく正則化を用いた自己注意機構付き深層ドラム採譜2020

Author(s)

Organizer

Related Report

[Presentation] 事前学習済み言語モデルによる正則化を用いた深層ドラム採譜2020

Author(s)

Organizer

Related Report

[Presentation] LSTM-HSMMハイブリッドモデルに基づく音楽構造解析2020

Author(s)

Organizer

Related Report

[Presentation] スペクトログラムとピッチグラムの深層クラスタリングに基づく複数楽器パート採譜2020

Author(s)

Organizer

Related Report

[Presentation] 階層隠れセミマルコフモデルと深層学習に基づく楽曲セクションの境界推定とラベル付け2020

Author(s)

Organizer

Related Report

[Presentation] 深層クラスタリングを用いた任意楽器パートの自動採譜2020

Author(s)

Organizer

Related Report

[Presentation] 深層音響・言語モデルの統合に基づくドラム採譜2020

錦見亮京都大学, 情報学研究科, 特別研究員(DC2)