2023 Fiscal Year Research-status Report

音の三要素に基づく生成過程を考慮した深層ベイズ自動採譜

Research Project

Project/Area Number	22KJ2959
Allocation Type	Multi-year Fund
Research Institution	Waseda University
Principal Investigator	田中啓太郎早稲田大学, 理工学術院, 特別研究員(DC1)
Project Period (FY)	2023-03-08 – 2025-03-31
Keywords	音楽情報処理 / 自動採譜 / 多楽器採譜 / 音高音色分離 / 楽器認識
Outline of Annual Research Achievements	本研究では，音楽音響信号を構成する全ての楽器に対して各楽譜を推定する，多楽器自動採譜技術を扱う．本年度は主に，昨年度開発した奏法の違いを陽に考慮する三要素分離手法の発展に取り組んだ．従来モデルは単音の楽器音に対しては動作したが，時変音高を有する単旋律入力や歌声に対しては，所望の潜在特徴が他空間に漏洩してしまう問題があった．この問題を受け，従来モデルを構成モジュールとして保持しつつ，新たな確率的生成モデルを定式化した．具体的には，エンコーダとデコーダをそれぞれ２回使用する構造により，各潜在空間において不必要な情報が徐々に淘汰されていく訓練手法を提案した．これにより，多様な入力に対する潜在空間での分離精度を向上させることに成功した．加えて，楽器認識に焦点を当てた研究も行った．音楽情報処理分野において楽器認識タスクは主流な問題設定の一つである．しかしながら，限られたベンチマークデータセットでのみ精度が評価されており，特にデータ量の少ない他のデータセットに対しては，期待される認識精度が得られないという課題があった．そこで，人工的に作成されたデータセットの効率的な活用法を提案し，他のデータセットに対する認識精度を向上させることに成功した．さらに本年度は，派生技術の他ドメイン展開にも取り組んだ．三要素分離手法の前身である同質性に着目した距離学習は，話し方の違いに頑健な読唇術手法へ，三要素分離手法の根幹である時変時不変の性質に着目したモデル構造は，効率的かつ高精度な視聴覚音声強調手法へ，それぞれ応用に成功した．
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 本年度は，多種多様な音色をもつ一般の楽器に対応した多楽器自動採譜手法の開発を行う計画であった．それに対し，多楽器自動採譜を行う上で不可欠な三要素分離において，新たな確率的生成モデルの定式化を通じ，多様な入力に対する潜在空間での分離精度を向上させることに成功した．現在は論文誌投稿を念頭に成果をとりまとめている．楽器認識タスクにおける研究は，英国ロンドン大学クイーンメアリー校のC4DMとの国際共同研究として実施した．成果は既にISMIRにて発表，現在も共同研究が続いている．また，他ドメイン展開についてはInterspeechおよびEUSIPCO（学生ベストペーパー賞ノミネート）にて発表している．
Strategy for Future Research Activity	本年度までに構築した分離三要素空間と楽器認識手法をベースとし，多楽器自動採譜手法を開発する．次年度は本研究課題の最終年度にあたるため，成果全体のとりまとめも行う．
Causes of Carryover	本年度からの基金化と加速する円安を受け，少額ながら次年度の論文誌掲載費用を補填するため．同目的での使用を予定．

Research Products
(11 results)

All 2023 Other

All Int'l Joint Research (1 results) Presentation (9 results) (of which Int'l Joint Research: 5 results) Remarks (1 results)

[Int'l Joint Research] Queen Mary University of London(英国)
- Country Name
  UNITED KINGDOM
- Counterpart Institution
  Queen Mary University of London
[Presentation] 通常発声と無音発声の動画を用いた発話内容推測における距離学習に基づく精度差改善手法2023
- Author(s)
  柏木爽良
- Organizer
  Visual Computing (VC) Long Track
[Presentation] Detecting Unknown Multiword Expressions in Natural English Reading via Eye Gaze2023
- Author(s)
  Taichi Higasa
- Organizer
  Visual Computing (VC) Short Track
[Presentation] Audio-Visual Speech Enhancement With Preserving Specific Off-Screen Speech2023
- Author(s)
  Tomoya Yoshinaga
- Organizer
  Visual Computing (VC) Short Track
[Presentation] パッチ分割による拡散確率モデルのメモリ消費量削減の検討2023
- Author(s)
  荒川深映
- Organizer
  画像の認識・理解シンポジウム (MIRU) ポスター発表
[Presentation] On the Use of Synthesized Datasets and Transformer Adaptors for Musical Instrument Recognition2023
- Author(s)
  Keitaro Tanaka
- Organizer
  International Society for Music Information Retrieval (ISMIR) Late-Breaking Demo
- Int'l Joint Research
[Presentation] Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction2023
- Author(s)
  Tomoya Yoshinaga (equal contribution)
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Int'l Joint Research
[Presentation] Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning2023
- Author(s)
  Sara Kashiwagi (equal contribution)
- Organizer
  Annual Conference of the International Speech Communication Association (Interspeech)
- Int'l Joint Research
[Presentation] Memory Efficient Diffusion Probabilistic Models via Patch-based Generation2023
- Author(s)
  Shinei Arakawa
- Organizer
  IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) workshops, Generative Models for Computer Vision
- Int'l Joint Research
[Presentation] Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability2023
- Author(s)
  Taichi Higasa
- Organizer
  ACM International Conference on Multimodal Interaction (ICMI) workshops
- Int'l Joint Research
[Remarks] Keitaro Tanaka
- URL
  https://sites.google.com/view/keitarotanaka/

2023 Fiscal Year Research-status Report

音の三要素に基づく生成過程を考慮した深層ベイズ自動採譜

Principal Investigator

田中 啓太郎 早稲田大学, 理工学術院, 特別研究員(DC1)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] Queen Mary University of London(英国)

Country Name

Counterpart Institution

[Presentation] 通常発声と無音発声の動画を用いた発話内容推測における距離学習に基づく精度差改善手法2023

Author(s)

Organizer

[Presentation] Detecting Unknown Multiword Expressions in Natural English Reading via Eye Gaze2023

Author(s)

Organizer

[Presentation] Audio-Visual Speech Enhancement With Preserving Specific Off-Screen Speech2023

Author(s)

Organizer

[Presentation] パッチ分割による拡散確率モデルのメモリ消費量削減の検討2023

Author(s)

Organizer

[Presentation] On the Use of Synthesized Datasets and Transformer Adaptors for Musical Instrument Recognition2023

Author(s)

Organizer

[Presentation] Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction2023

Author(s)

Organizer

[Presentation] Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning2023

Author(s)

Organizer

[Presentation] Memory Efficient Diffusion Probabilistic Models via Patch-based Generation2023

Author(s)

Organizer

[Presentation] Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability2023

Author(s)

Organizer

[Remarks] Keitaro Tanaka

URL

田中啓太郎早稲田大学, 理工学術院, 特別研究員(DC1)