Research on retriving speech and acoustic dark data

Research Project

Project/Area Number	23K24895
Project/Area Number (Other)	22H03639 (2022-2023)
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Multi-year Fund (2024) Single-year Grants (2022-2023)
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Keio University (2024) The University of Tokyo (2022-2023)
Principal Investigator	高道慎之介慶應義塾大学, 理工学部(矢上), 准教授 (90784330)
Co-Investigator(Kenkyū-buntansha)	井本桂右同志社大学, 文化情報学部, 准教授 (90802116) 猿渡洋東京大学, 大学院情報理工学系研究科, 教授 (30324974)
Project Period (FY)	2022-04-01 – 2026-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥17,160,000 (Direct Cost: ¥13,200,000、Indirect Cost: ¥3,960,000) Fiscal Year 2025: ¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000) Fiscal Year 2024: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000) Fiscal Year 2023: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000) Fiscal Year 2022: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Keywords	音声認識合成 / 音響認識合成 / ダークデータ / コーパス
Outline of Research at the Start	本研究課題は，音声音響ダークデータから，超大規模かつ広範利用できる音声音響コーパスを構築する方法論を扱う．その実現にあたり，(1) 音声音響ダークデータを自動取得するWeb工学技術の開発，(2)ダークデータからその利用可能性を定量化する機械学習技術の開発，(3)大規模データに対する効率的なラベリング法，(4)種々の音声音響認識合成における評価を実施する．
Outline of Annual Research Achievements	本年度は (1) 評価ループに基づく音声合成，(2) Webデータからのコーパス構築法，(3) 音響イベントとシーンの同時分析を提案した． (1) 評価ループに基づく音声合成については，ダークデータから音声合成を構築する方法を提案した．ダークデータに対するデータ洗練処理が機械学習にとって最適とは限らないため，最終的な機械学習性能が最大になるようにデータ洗練・選択を実行する方法を提案した． (2) Webデータからのコーパス構築法については，動画に紐づくメタ情報を用いて，所望の音声データを構築する方法を提案した． (3) 音響イベントとシーン同時分析においては，Webデータのようなノイジーなデータにしばしば登場する乱出データを分析する方法を提案した．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 当初の予定通りに進行している．
Strategy for Future Research Activity	大規模音声音響モデル，大規模音声コーパスの頒布を進める．

Report

(2 results)

2023 Annual Research Report
2022 Annual Research Report

Research Products
(22 results)

All 2024 2023 2022 Other

All Int'l Joint Research (1 results) Journal Article (3 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 3 results, Open Access: 3 results) Presentation (18 results) (of which Int'l Joint Research: 8 results, Invited: 1 results)

[Int'l Joint Research] Carnegie mellon university(米国)
- Related Report
  2023 Annual Research Report
[Journal Article] SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources2024
- Author(s)
  Takaaki Saeki , Shinnosuke Takamichi , Tomohiko Nakamura , Naoko Tanji , and Hiroshi Saruwatari
- Journal Title
  
  IEEE Access
  
  Volume: -
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis2024
- Author(s)
  Takaaki Saeki , Soumi Maiti , Xinjian Li , Shinji Watanabe , Shinnosuke Takamichi , and Hiroshi Saruwatari
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: -
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Onoma-to-wave: Environmental Sound Synthesis from Onomatopoeic Words2022
- Author(s)
  Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, and Yoichi Yamashita
- Journal Title
  
  APSIPA Transactions on Signal and Information Processing
  
  Volume: 11 Pages: 1-20
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] 環境音に対する日本語自由記述文コーパスとベンチマーク分析2024
- Author(s)
  岡本悠希 , 高道慎之介 , 森松亜依 , 渡邊亞椰 , 井本桂右 , and 山下洋一
- Organizer
  言語処理学会全国大会
- Related Report
  2023 Annual Research Report
[Presentation] Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット2023
- Author(s)
  渡邊亞椰 , 高道慎之介 , 齋藤佑樹 , 辛徳泰 , and 猿渡洋
- Organizer
  日本音響学会秋季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control2023
- Author(s)
  Aya Watanabe , Shinnosuke Takamichi , Yuki Saito , Wataru Nakata , Detai Xin , and Hiroshi Saruwatari
- Organizer
  IEEE Automatic Speech Recogiton and Understanding Workshop
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] 深層学習で獲得される音声シンボルは自然言語シンボルと同様に Zipf 則に従うか？2023
- Author(s)
  前田紘希 , 高道慎之介 , 朴浚鎔 , and 猿渡洋
- Organizer
  日本音響学会秋季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] 学習・評価ループを用いたデータ選択によるダークデータからの音声合成2023
- Author(s)
  関健太郎 , 高道慎之介 , 佐伯高明 , and 猿渡洋
- Organizer
  日本音響学会春季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images2023
- Author(s)
  Hien Ohnaka , Shinnosuke Takamichi , Keisuke Imoto , Yuki Okamoto , Kazuki Fujii , and Hiroshi Saruwatari
- Organizer
  Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics2023
- Author(s)
  Joonyong Park , Shinnosuke Takamichi , Tomohiko Nakamura , Kentaro Seki , Detai Xin , and Hiroshi Saruwatari
- Organizer
  Proceedings of Interspeech
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection2023
- Author(s)
  Kentaro Seki , Shinnosuke Takamichi , Takaaki Saeki , and Hiroshi Saruwatari
- Organizer
  Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Approach2023
- Author(s)
  Ami Igarashi, Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Noboru Harada, and Keisuke Imoto
- Organizer
  Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] アノテータごとのばらつきを考慮した音響イベント検出2023
- Author(s)
  古賀直樹, 坂東宣昭, 井本桂右
- Organizer
  情報処理学会第86回全国大会
- Related Report
  2023 Annual Research Report
[Presentation] 環境音分析における事前学習済みモデルのバイアス調査2023
- Author(s)
  井上かほり, 井本桂右
- Organizer
  日本音響学会 2024年春季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] 計算機による環境音の理解・解釈に向けた統合的コンペティションDCASE Challengeへの招待2023
- Author(s)
  井本桂右
- Organizer
  日本音響学会 2023年春季研究発表会
- Related Report
  2022 Annual Research Report
- Invited
[Presentation] Visual onoma-to-wave：画像オノマトペと音源画像を利用した環境音合成の提案2023
- Author(s)
  大中緋慧
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2022 Annual Research Report
[Presentation] Visual Onoma-to-Wave: Environmental Sound Synthesis From Visual Onomatopoeias and Sound-Source Images2023
- Author(s)
  Hien Ohnaka
- Organizer
  Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection2023
- Author(s)
  Kentaro Seki
- Organizer
  Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 学習・評価ループを用いたデータ選択によるダークデータからの音声合成2023
- Author(s)
  関健太郎
- Organizer
  日本音響学会 2023年春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] 環境音合成の入力情報に応じた主観評価手法の検討2022
- Author(s)
  岡本悠希
- Organizer
  日本音響学会 2022年秋季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] How Should We Evaluate Synthesized Environmental Sounds2022
- Author(s)
  Yuki Okamoto
- Organizer
  Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2022 Annual Research Report
- Int'l Joint Research

Research on retriving speech and acoustic dark data

Principal Investigator

高道 慎之介 慶應義塾大学, 理工学部(矢上), 准教授 (90784330)

¥17,160,000 (Direct Cost: ¥13,200,000、Indirect Cost: ¥3,960,000)

Current Status of Research Progress

Reason

Report

Research Products

[Int'l Joint Research] Carnegie mellon university(米国)

Related Report

[Journal Article] SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources2024

Author(s)

Journal Title

Related Report

[Journal Article] Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis2024

Author(s)

Journal Title

Related Report

[Journal Article] Onoma-to-wave: Environmental Sound Synthesis from Onomatopoeic Words2022

Author(s)

Journal Title

Related Report

[Presentation] 環境音に対する日本語自由記述文コーパスとベンチマーク分析2024

Author(s)

Organizer

Related Report

[Presentation] Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット2023

Author(s)

Organizer

Related Report

[Presentation] Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control2023

Author(s)

Organizer

Related Report

[Presentation] 深層学習で獲得される音声シンボルは自然言語シンボルと同様に Zipf 則に従うか？2023

Author(s)

Organizer

Related Report

[Presentation] 学習・評価ループを用いたデータ選択によるダークデータからの音声合成2023

Author(s)

Organizer

Related Report

[Presentation] Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images2023

Author(s)

Organizer

Related Report

[Presentation] How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics2023

Author(s)

Organizer

Related Report

[Presentation] Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection2023

Author(s)

Organizer

Related Report

[Presentation] Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Approach2023

Author(s)

Organizer

Related Report

[Presentation] アノテータごとのばらつきを考慮した音響イベント検出2023

Author(s)

Organizer

Related Report

[Presentation] 環境音分析における事前学習済みモデルのバイアス調査2023

Author(s)

Organizer

Related Report

[Presentation] 計算機による環境音の理解・解釈に向けた統合的コンペティションDCASE Challengeへの招待2023

Author(s)

Organizer

Related Report

[Presentation] Visual onoma-to-wave：画像オノマトペと音源画像を利用した環境音合成の提案2023

Author(s)

Organizer

Related Report

[Presentation] Visual Onoma-to-Wave: Environmental Sound Synthesis From Visual Onomatopoeias and Sound-Source Images2023

Author(s)

Organizer

Related Report

[Presentation] Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection2023

Author(s)

高道慎之介慶應義塾大学, 理工学部(矢上), 准教授 (90784330)