A Spoken Language Knowledge Expansion Framework for Real-World Speech Recognition Using Deep Learning Technology and Human Collaboration

Research Project

Project/Area Number	18K11431
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Shizuoka University
Principal Investigator	Kai Atsuhiko 静岡大学, 工学部, 准教授 (60283496)
Project Period (FY)	2018-04-01 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2020: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2018: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords	自動音声認識 / 深層学習 / 長時間録音 / 自動修正 / 音声検索語検出 / 読み推定 / End-to-end型 / リアルタイム / 長時間収録 / 自動字幕 / End-to-end型音声認識 / 長時間収録音声 / End-to-end型音声認識モデル / 環境雑音 / 音声認識誤り修正 / 低コスト字幕修正 / 回り込み音声 / End-to-end音声認識モデル / 注目話者音声認識 / 話者音声分離 / 音声区間検出 / 言語知識拡充 / ディープニューラルネットワーク(DNN) / 情報保障 / 講義音声 / 半自動学習
Outline of Final Research Achievements	In order to make automatic speech recognition (ASR) technology applicable to long-term automatic subtitling and retrieval, we developed ASR-related technology that enables the expansion of spoken language knowledge, such as new technical terms, at a low cost. Specifically, we constructed an ASR system that can output in real-time, and realized a semi-automatic correction support system in which users do not directly edit the output text, but only input the corrected words. Regarding the spoken term detection technique used to obtain the timing at which the corrected word appears in the recording, an end-to-end ASR model that infers the reading of speech was used to improve the detection accuracy for unknown words, which are common among misrecognized words. In addition, speaker separation and voice activity detection methods were developed for noisy and multi-speaker speech, and their effectiveness was confirmed.
Academic Significance and Societal Importance of the Research Achievements	講義や会議などの長時間音声に対する自動音声認識（ASR）技術の適用において、近年のAI技術を用いた事例では、新しい語や話題を低コストで効率的に習得する手法が不足しており、全自動での字幕生成等では実用的な認識精度がまだ達成されていない。本研究課題では、リアルタイム性を重視して新しい語のテキスト情報のみを手動で提供する枠組みを提案し、ASR技術を基にした自動字幕や検索の精度を低コストで改善させる手法を提案した。これにより、ASR技術の応用可能性を一段と高められることを実証した。

Report

(6 results)

2022 Annual Research Report Final Research Report ( PDF )
2021 Research-status Report
2020 Research-status Report
2019 Research-status Report
2018 Research-status Report

Research Products
(17 results)

All 2023 2022 2021 2020 2019 2018

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (16 results) (of which Int'l Joint Research: 6 results)

[Journal Article] Domain Adaptation with Augmented Data by Deep Neural Network Based Method Using Re-Recorded Speech for Automatic Speech Recognition in Real Environment2022
- Author(s)
  Nahar Raufun、Miwa Shogo、Kai Atsuhiko
- Journal Title
  
  Sensors
  
  Volume: 22 Issue: 24 Pages: 9945-9945
- DOI
  10.3390/s22249945
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition2023
- Author(s)
  R. Nahar, R. Suzuki, A. Kai
- Organizer
  音声研究会
- Related Report
  2022 Annual Research Report
[Presentation] 自己教師有り学習モデルXLSRと日本語諸方言コーパスを利用した諸方言音声認識モデル2023
- Author(s)
  三輪祥吾, 甲斐充彦
- Organizer
  音声研究会
- Related Report
  2022 Annual Research Report
[Presentation] Robust Query-by-example Spoken Term Detection for Unknown Words Using Speech Retrieval-oriented E2E ASR Modeling2021
- Author(s)
  Takumi Kurokawa, Atsuhiko Kai
- Organizer
  IEEE 10th Global Conference on Consumer Electronics (GCCE2021)
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] Retrieval-oriented E2E ASR Modeling for Improved Query-by-example Spoken Term Detection2021
- Author(s)
  Takumi Kurokawa, Atsuhiko Kai
- Organizer
  Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC 2021)
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] Efficient Channel Adaptation of ASR by DNN-based Data Augmentation using Re-recorded Paired data with Automatic Alignment Correction2021
- Author(s)
  Nahar Raufun, Kai Atsuhiko
- Organizer
  日本音響学会2021年春季研究発表会
- Related Report
  2020 Research-status Report
[Presentation] Effect of Data Augmentation on DNN-Based VAD for Automatic Speech Recognition in Noisy Environment2020
- Author(s)
  Nahar Raufun, Kai Atsuhiko
- Organizer
  IEEE 9th Global Conference on Consumer Electronics (GCCE 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Effects of End-to-end ASR and Score Fusion Model Learning for Improved Query-by-example Spoken Term Detection2020
- Author(s)
  Takumi Kurokawa, Atsuhiko Kai, Hiroki Kondo
- Organizer
  Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] End-to-end 音声認識における会議音声への適応および回り込み音声の影響軽減2020
- Author(s)
  大内一亜, 甲斐充彦
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2019 Research-status Report
[Presentation] CNNベース識別モデルによるF0推定と伴奏重畳歌唱音声および雑音環境下読み上げ音声における評価2020
- Author(s)
  川村智規，甲斐充彦，中川聖一
- Organizer
  日本音響学会2020年春季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] 講演音声認識の修正語のオンライン教示による半自動的な修正手法と語彙適応の併用の効果2019
- Author(s)
  寺田侑司, 塚本皓斗, 甲斐充彦
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] 講義音声認識のための回り込み音声の影響分析とDNN音声分離モデルによる改善の一検討2019
- Author(s)
  脇屋義也, 福井明日香, 甲斐充彦
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] CNNベース識別モデルによるF0推定と歌唱および読み上げ音声における評価2019
- Author(s)
  川村智規，甲斐充彦，中川聖一
- Organizer
  第21回音声言語シンポジウム（情報処理学会音声言語情報処理研究会）
- Related Report
  2019 Research-status Report
[Presentation] Multi-Condition Training of Denoising Autoencoder by Augmenting Simulated Reverberant Speech Data2018
- Author(s)
  Nahar Raufun、Kawai Takashi、Kai Atsuhiko
- Organizer
  2018 IEEE 7th Global Conference on Consumer Electronics (GCCE 2018)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Noise Robust Fundamental Frequency Estimation of Speech using CNN-based discriminative modeling2018
- Author(s)
  Kawamura Tomonori、Kai Atsuhiko、Nakagawa Seiichi
- Organizer
  5th. International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] 音声クエリからの音声検索語検出におけるスコア統合モデル学習の効果2018
- Author(s)
  近藤宏樹，甲斐充彦，大石修司
- Organizer
  日本音響学会2018年秋季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] CNN ベース識別モデルによる雑音に頑健な基本周波数の推定2018
- Author(s)
  川村智規，甲斐充彦，中川聖一
- Organizer
  日本音響学会2018年秋季研究発表会
- Related Report
  2018 Research-status Report

A Spoken Language Knowledge Expansion Framework for Real-World Speech Recognition Using Deep Learning Technology and Human Collaboration

Principal Investigator

Kai Atsuhiko 静岡大学, 工学部, 准教授 (60283496)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Journal Article] Domain Adaptation with Augmented Data by Deep Neural Network Based Method Using Re-Recorded Speech for Automatic Speech Recognition in Real Environment2022

Author(s)

Journal Title

DOI

Related Report

[Presentation] Domain Adaptation for Improving End-to-end ASR Performance of Classroom Speech with Variable Recording Condition2023

Author(s)

Organizer

Related Report

[Presentation] 自己教師有り学習モデルXLSRと日本語諸方言コーパスを利用した諸方言音声認識モデル2023

Author(s)

Organizer

Related Report

[Presentation] Robust Query-by-example Spoken Term Detection for Unknown Words Using Speech Retrieval-oriented E2E ASR Modeling2021

Author(s)

Organizer

Related Report

[Presentation] Retrieval-oriented E2E ASR Modeling for Improved Query-by-example Spoken Term Detection2021

Author(s)

Organizer

Related Report

[Presentation] Efficient Channel Adaptation of ASR by DNN-based Data Augmentation using Re-recorded Paired data with Automatic Alignment Correction2021

Author(s)

Organizer

Related Report

[Presentation] Effect of Data Augmentation on DNN-Based VAD for Automatic Speech Recognition in Noisy Environment2020

Author(s)

Organizer

Related Report

[Presentation] Effects of End-to-end ASR and Score Fusion Model Learning for Improved Query-by-example Spoken Term Detection2020

Author(s)

Organizer

Related Report

[Presentation] End-to-end 音声認識における会議音声への適応および回り込み音声の影響軽減2020

Author(s)

Organizer

Related Report

[Presentation] CNNベース識別モデルによるF0推定と伴奏重畳歌唱音声および雑音環境下読み上げ音声における評価2020

Author(s)

Organizer

Related Report

[Presentation] 講演音声認識の修正語のオンライン教示による半自動的な修正手法と語彙適応の併用の効果2019

Author(s)

Organizer

Related Report

[Presentation] 講義音声認識のための回り込み音声の影響分析とDNN音声分離モデルによる改善の一検討2019

Author(s)

Organizer

Related Report

[Presentation] CNNベース識別モデルによるF0推定と歌唱および読み上げ音声における評価2019

Author(s)

Organizer

Related Report

[Presentation] Multi-Condition Training of Denoising Autoencoder by Augmenting Simulated Reverberant Speech Data2018

Author(s)

Organizer

Related Report

[Presentation] Noise Robust Fundamental Frequency Estimation of Speech using CNN-based discriminative modeling2018

Author(s)

Organizer

Related Report

[Presentation] 音声クエリからの音声検索語検出におけるスコア統合モデル学習の効果2018

Author(s)

Organizer

Related Report

[Presentation] CNN ベース識別モデルによる雑音に頑健な基本周波数の推定2018

Author(s)

Organizer

Related Report