音の三要素に基づく生成過程を考慮した深層ベイズ自動採譜

Research Project

Project/Area Number	22KJ2959
Project/Area Number (Other)	22J22424 (2022)
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Multi-year Fund (2023) Single-year Grants (2022)
Section	国内
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Waseda University
Principal Investigator	田中啓太郎早稲田大学, 理工学術院, 特別研究員(DC1)
Project Period (FY)	2023-03-08 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥2,500,000 (Direct Cost: ¥2,500,000) Fiscal Year 2024: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2023: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2022: ¥900,000 (Direct Cost: ¥900,000)
Keywords	音楽情報処理 / 自動採譜 / 多楽器採譜 / 音高音色分離 / 楽器認識
Outline of Research at the Start	音楽音響信号を楽譜に変換する「自動採譜」は，音楽情報処理において積年のタスクである．しかし，対象楽器の膨大な学習データが必要なうえ，学習データに含まれない楽器に対しては，大きく性能が劣化し不自然な採譜結果となる．本研究ではこれらの課題を，音の三要素に基づく深層ベイズモデルによって包括的に解決する．具体的には，人間の知覚に基づく音の三要素を計算機に整備し，音の三要素から信号が生成されることを陽に考慮した採譜手法を開発する．
Outline of Annual Research Achievements	本研究では，音楽音響信号を構成する全ての楽器に対して各楽譜を推定する，多楽器自動採譜技術を扱う．本年度は主に，昨年度開発した奏法の違いを陽に考慮する三要素分離手法の発展に取り組んだ．従来モデルは単音の楽器音に対しては動作したが，時変音高を有する単旋律入力や歌声に対しては，所望の潜在特徴が他空間に漏洩してしまう問題があった．この問題を受け，従来モデルを構成モジュールとして保持しつつ，新たな確率的生成モデルを定式化した．具体的には，エンコーダとデコーダをそれぞれ２回使用する構造により，各潜在空間において不必要な情報が徐々に淘汰されていく訓練手法を提案した．これにより，多様な入力に対する潜在空間での分離精度を向上させることに成功した．加えて，楽器認識に焦点を当てた研究も行った．音楽情報処理分野において楽器認識タスクは主流な問題設定の一つである．しかしながら，限られたベンチマークデータセットでのみ精度が評価されており，特にデータ量の少ない他のデータセットに対しては，期待される認識精度が得られないという課題があった．そこで，人工的に作成されたデータセットの効率的な活用法を提案し，他のデータセットに対する認識精度を向上させることに成功した．さらに本年度は，派生技術の他ドメイン展開にも取り組んだ．三要素分離手法の前身である同質性に着目した距離学習は，話し方の違いに頑健な読唇術手法へ，三要素分離手法の根幹である時変時不変の性質に着目したモデル構造は，効率的かつ高精度な視聴覚音声強調手法へ，それぞれ応用に成功した．
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 本年度は，多種多様な音色をもつ一般の楽器に対応した多楽器自動採譜手法の開発を行う計画であった．それに対し，多楽器自動採譜を行う上で不可欠な三要素分離において，新たな確率的生成モデルの定式化を通じ，多様な入力に対する潜在空間での分離精度を向上させることに成功した．現在は論文誌投稿を念頭に成果をとりまとめている．楽器認識タスクにおける研究は，英国ロンドン大学クイーンメアリー校のC4DMとの国際共同研究として実施した．成果は既にISMIRにて発表，現在も共同研究が続いている．また，他ドメイン展開についてはInterspeechおよびEUSIPCO（学生ベストペーパー賞ノミネート）にて発表している．
Strategy for Future Research Activity	本年度までに構築した分離三要素空間と楽器認識手法をベースとし，多楽器自動採譜手法を開発する．次年度は本研究課題の最終年度にあたるため，成果全体のとりまとめも行う．

Report

(2 results)

2023 Research-status Report
2022 Annual Research Report

Research Products
(21 results)

All 2023 2022 Other

All Int'l Joint Research (1 results) Presentation (19 results) (of which Int'l Joint Research: 7 results) Remarks (1 results)

[Int'l Joint Research] Queen Mary University of London(英国)
- Related Report
  2023 Research-status Report
[Presentation] 通常発声と無音発声の動画を用いた発話内容推測における距離学習に基づく精度差改善手法2023
- Author(s)
  柏木爽良
- Organizer
  Visual Computing (VC) Long Track
- Related Report
  2023 Research-status Report
[Presentation] Detecting Unknown Multiword Expressions in Natural English Reading via Eye Gaze2023
- Author(s)
  Taichi Higasa
- Organizer
  Visual Computing (VC) Short Track
- Related Report
  2023 Research-status Report
[Presentation] Audio-Visual Speech Enhancement With Preserving Specific Off-Screen Speech2023
- Author(s)
  Tomoya Yoshinaga
- Organizer
  Visual Computing (VC) Short Track
- Related Report
  2023 Research-status Report
[Presentation] パッチ分割による拡散確率モデルのメモリ消費量削減の検討2023
- Author(s)
  荒川深映
- Organizer
  画像の認識・理解シンポジウム (MIRU) ポスター発表
- Related Report
  2023 Research-status Report
[Presentation] On the Use of Synthesized Datasets and Transformer Adaptors for Musical Instrument Recognition2023
- Author(s)
  Keitaro Tanaka
- Organizer
  International Society for Music Information Retrieval (ISMIR) Late-Breaking Demo
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction2023
- Author(s)
  Tomoya Yoshinaga (equal contribution)
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning2023
- Author(s)
  Sara Kashiwagi (equal contribution)
- Organizer
  Annual Conference of the International Speech Communication Association (Interspeech)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Memory Efficient Diffusion Probabilistic Models via Patch-based Generation2023
- Author(s)
  Shinei Arakawa
- Organizer
  IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) workshops, Generative Models for Computer Vision
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability2023
- Author(s)
  Taichi Higasa
- Organizer
  ACM International Conference on Multimodal Interaction (ICMI) workshops
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] 動画内話者の音声強調における特定背景音声の透過2023
- Author(s)
  吉永朋矢
- Organizer
  情報処理学会第85回全国大会
- Related Report
  2022 Annual Research Report
[Presentation] 覚醒度と感情価に基づく音楽による画像スタイル変換2023
- Author(s)
  神庭有花
- Organizer
  情報処理学会第85回全国大会
- Related Report
  2022 Annual Research Report
[Presentation] 口パク動画の発話内容推測における距離学習に基づく精度向上手法2023
- Author(s)
  柏木爽良
- Organizer
  情報処理学会第85回全国大会
- Related Report
  2022 Annual Research Report
[Presentation] Unsupervised Disentanglement of Timbral, Pitch, and Variation Features From Musical Instrument Sounds With Random Perturbation2022
- Author(s)
  Keitaro Tanaka
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Audio-Driven Violin Performance Animation with Clear Fingering and Bowing2022
- Author(s)
  Asuka Hirata
- Organizer
  ACM International Conference and Exhibition on Computer Graphics and Interactive Techniques (SIGGRAPH) Posters
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 運指と運弓を反映した音響信号からのヴァイオリン演奏アニメーションの自動生成2022
- Author(s)
  平田明日香
- Organizer
  Visual Computing (VC) Short Track
- Related Report
  2022 Annual Research Report
[Presentation] 視線情報と比喩度に基づく英語フレーズの理解度推定2022
- Author(s)
  樋笠泰祐
- Organizer
  インタラクティブシステムとソフトウェアに関するワークショップ (WISS)
- Related Report
  2022 Annual Research Report
[Presentation] 口パク動画の発話内容推測における距離学習に基づく精度向上手法の検討2022
- Author(s)
  柏木爽良
- Organizer
  ビジュアルコンピューティングワークショップ (VCWS)
- Related Report
  2022 Annual Research Report
[Presentation] 入力動画に対する動画内話者と特定背景話者の同時音声抽出2022
- Author(s)
  吉永朋矢
- Organizer
  ビジュアルコンピューティングワークショップ (VCWS)
- Related Report
  2022 Annual Research Report
[Presentation] Patch-based Memory Efficient Diffusion Probabilistic Models2022
- Author(s)
  Shinei Arakawa
- Organizer
  Visual Computing (VC) Posters
- Related Report
  2022 Annual Research Report
[Remarks] Keitaro Tanaka
- URL
  https://sites.google.com/view/keitarotanaka/
- Related Report
  2023 Research-status Report 2022 Annual Research Report

音の三要素に基づく生成過程を考慮した深層ベイズ自動採譜

Principal Investigator

田中 啓太郎 早稲田大学, 理工学術院, 特別研究員(DC1)

¥2,500,000 (Direct Cost: ¥2,500,000)

Current Status of Research Progress

Reason

Report

Research Products

[Int'l Joint Research] Queen Mary University of London(英国)

Related Report

[Presentation] 通常発声と無音発声の動画を用いた発話内容推測における距離学習に基づく精度差改善手法2023

Author(s)

Organizer

Related Report

[Presentation] Detecting Unknown Multiword Expressions in Natural English Reading via Eye Gaze2023

Author(s)

Organizer

Related Report

[Presentation] Audio-Visual Speech Enhancement With Preserving Specific Off-Screen Speech2023

Author(s)

Organizer

Related Report

[Presentation] パッチ分割による拡散確率モデルのメモリ消費量削減の検討2023

Author(s)

Organizer

Related Report

[Presentation] On the Use of Synthesized Datasets and Transformer Adaptors for Musical Instrument Recognition2023

Author(s)

Organizer

Related Report

[Presentation] Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction2023

Author(s)

Organizer

Related Report

[Presentation] Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning2023

Author(s)

Organizer

Related Report

[Presentation] Memory Efficient Diffusion Probabilistic Models via Patch-based Generation2023

Author(s)

Organizer

Related Report

[Presentation] Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability2023

Author(s)

Organizer

Related Report

[Presentation] 動画内話者の音声強調における特定背景音声の透過2023

Author(s)

Organizer

Related Report

[Presentation] 覚醒度と感情価に基づく音楽による画像スタイル変換2023

Author(s)

Organizer

Related Report

[Presentation] 口パク動画の発話内容推測における距離学習に基づく精度向上手法2023

Author(s)

Organizer

Related Report

[Presentation] Unsupervised Disentanglement of Timbral, Pitch, and Variation Features From Musical Instrument Sounds With Random Perturbation2022

Author(s)

Organizer

Related Report

[Presentation] Audio-Driven Violin Performance Animation with Clear Fingering and Bowing2022

Author(s)

Organizer

Related Report

[Presentation] 運指と運弓を反映した音響信号からのヴァイオリン演奏アニメーションの自動生成2022

Author(s)

Organizer

Related Report

[Presentation] 視線情報と比喩度に基づく英語フレーズの理解度推定2022

Author(s)

Organizer

Related Report

[Presentation] 口パク動画の発話内容推測における距離学習に基づく精度向上手法の検討2022

Author(s)

Organizer

Related Report

[Presentation] 入力動画に対する動画内話者と特定背景話者の同時音声抽出2022

Author(s)

田中啓太郎早稲田大学, 理工学術院, 特別研究員(DC1)