2023 Fiscal Year Annual Research Report

Research on retriving speech and acoustic dark data

Research Project

Project/Area Number	22H03639
Allocation Type	Single-year Grants
Research Institution	The University of Tokyo
Principal Investigator	高道慎之介東京大学, 大学院情報理工学系研究科, 講師 (90784330)
Co-Investigator(Kenkyū-buntansha)	井本桂右同志社大学, 理工学部, 准教授 (90802116)
Project Period (FY)	2022-04-01 – 2026-03-31
Keywords	音声認識合成 / 音響認識合成 / ダークデータ / コーパス
Outline of Annual Research Achievements	本年度は (1) 評価ループに基づく音声合成，(2) Webデータからのコーパス構築法，(3) 音響イベントとシーンの同時分析を提案した． (1) 評価ループに基づく音声合成については，ダークデータから音声合成を構築する方法を提案した．ダークデータに対するデータ洗練処理が機械学習にとって最適とは限らないため，最終的な機械学習性能が最大になるようにデータ洗練・選択を実行する方法を提案した． (2) Webデータからのコーパス構築法については，動画に紐づくメタ情報を用いて，所望の音声データを構築する方法を提案した． (3) 音響イベントとシーン同時分析においては，Webデータのようなノイジーなデータにしばしば登場する乱出データを分析する方法を提案した．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 当初の予定通りに進行している．
Strategy for Future Research Activity	大規模音声音響モデル，大規模音声コーパスの頒布を進める．

Research Products
(14 results)

All 2024 2023 Other

All Int'l Joint Research (1 results) Journal Article (2 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 2 results, Open Access: 2 results) Presentation (11 results) (of which Int'l Joint Research: 5 results)

[Int'l Joint Research] Carnegie mellon university(米国)
- Country Name
  U.S.A.
- Counterpart Institution
  Carnegie mellon university
[Journal Article] SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources2024
- Author(s)
  Takaaki Saeki , Shinnosuke Takamichi , Tomohiko Nakamura , Naoko Tanji , and Hiroshi Saruwatari
- Journal Title
  
  IEEE Access
  
  Volume: - Pages: -
- Peer Reviewed / Open Access
[Journal Article] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis2024
- Author(s)
  Takaaki Saeki , Soumi Maiti , Xinjian Li , Shinji Watanabe , Shinnosuke Takamichi , and Hiroshi Saruwatari
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: - Pages: -
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] 環境音に対する日本語自由記述文コーパスとベンチマーク分析2024
- Author(s)
  岡本悠希 , 高道慎之介 , 森松亜依 , 渡邊亞椰 , 井本桂右 , and 山下洋一
- Organizer
  言語処理学会全国大会
[Presentation] Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット2023
- Author(s)
  渡邊亞椰 , 高道慎之介 , 齋藤佑樹 , 辛徳泰 , and 猿渡洋
- Organizer
  日本音響学会秋季研究発表会
[Presentation] Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control2023
- Author(s)
  Aya Watanabe , Shinnosuke Takamichi , Yuki Saito , Wataru Nakata , Detai Xin , and Hiroshi Saruwatari
- Organizer
  IEEE Automatic Speech Recogiton and Understanding Workshop
- Int'l Joint Research
[Presentation] 深層学習で獲得される音声シンボルは自然言語シンボルと同様に Zipf 則に従うか？2023
- Author(s)
  前田紘希 , 高道慎之介 , 朴浚鎔 , and 猿渡洋
- Organizer
  日本音響学会秋季研究発表会
[Presentation] 学習・評価ループを用いたデータ選択によるダークデータからの音声合成2023
- Author(s)
  関健太郎 , 高道慎之介 , 佐伯高明 , and 猿渡洋
- Organizer
  日本音響学会春季研究発表会
[Presentation] Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images2023
- Author(s)
  Hien Ohnaka , Shinnosuke Takamichi , Keisuke Imoto , Yuki Okamoto , Kazuki Fujii , and Hiroshi Saruwatari
- Organizer
  Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics2023
- Author(s)
  Joonyong Park , Shinnosuke Takamichi , Tomohiko Nakamura , Kentaro Seki , Detai Xin , and Hiroshi Saruwatari
- Organizer
  Proceedings of Interspeech
- Int'l Joint Research
[Presentation] Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection2023
- Author(s)
  Kentaro Seki , Shinnosuke Takamichi , Takaaki Saeki , and Hiroshi Saruwatari
- Organizer
  Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Approach2023
- Author(s)
  Ami Igarashi, Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Noboru Harada, and Keisuke Imoto
- Organizer
  Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Int'l Joint Research
[Presentation] アノテータごとのばらつきを考慮した音響イベント検出2023
- Author(s)
  古賀直樹, 坂東宣昭, 井本桂右
- Organizer
  情報処理学会第86回全国大会
[Presentation] 環境音分析における事前学習済みモデルのバイアス調査2023
- Author(s)
  井上かほり, 井本桂右
- Organizer
  日本音響学会 2024年春季研究発表会

2023 Fiscal Year Annual Research Report

Research on retriving speech and acoustic dark data

Principal Investigator

高道 慎之介 東京大学, 大学院情報理工学系研究科, 講師 (90784330)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] Carnegie mellon university(米国)

Country Name

Counterpart Institution

[Journal Article] SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources2024

Author(s)

Journal Title

[Journal Article] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis2024

Author(s)

Journal Title

[Presentation] 環境音に対する日本語自由記述文コーパスとベンチマーク分析2024

Author(s)

Organizer

[Presentation] Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット2023

Author(s)

Organizer

[Presentation] Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control2023

Author(s)

Organizer

[Presentation] 深層学習で獲得される音声シンボルは自然言語シンボルと同様に Zipf 則に従うか？2023

Author(s)

Organizer

[Presentation] 学習・評価ループを用いたデータ選択によるダークデータからの音声合成2023

Author(s)

Organizer

[Presentation] Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images2023

Author(s)

Organizer

[Presentation] How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics2023

Author(s)

Organizer

[Presentation] Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection2023

Author(s)

Organizer

[Presentation] Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Approach2023

Author(s)

Organizer

[Presentation] アノテータごとのばらつきを考慮した音響イベント検出2023

Author(s)

Organizer

[Presentation] 環境音分析における事前学習済みモデルのバイアス調査2023

Author(s)

Organizer

高道慎之介東京大学, 大学院情報理工学系研究科, 講師 (90784330)