2022 Fiscal Year Final Research Report
A Spoken Language Knowledge Expansion Framework for Real-World Speech Recognition Using Deep Learning Technology and Human Collaboration
Project/Area Number |
18K11431
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | Shizuoka University |
Principal Investigator |
Kai Atsuhiko 静岡大学, 工学部, 准教授 (60283496)
|
Project Period (FY) |
2018-04-01 – 2023-03-31
|
Keywords | 自動音声認識 / 深層学習 / 長時間録音 / 自動修正 / 音声検索語検出 / 読み推定 / End-to-end型 / リアルタイム |
Outline of Final Research Achievements |
In order to make automatic speech recognition (ASR) technology applicable to long-term automatic subtitling and retrieval, we developed ASR-related technology that enables the expansion of spoken language knowledge, such as new technical terms, at a low cost. Specifically, we constructed an ASR system that can output in real-time, and realized a semi-automatic correction support system in which users do not directly edit the output text, but only input the corrected words. Regarding the spoken term detection technique used to obtain the timing at which the corrected word appears in the recording, an end-to-end ASR model that infers the reading of speech was used to improve the detection accuracy for unknown words, which are common among misrecognized words. In addition, speaker separation and voice activity detection methods were developed for noisy and multi-speaker speech, and their effectiveness was confirmed.
|
Free Research Field |
音声言語処理
|
Academic Significance and Societal Importance of the Research Achievements |
講義や会議などの長時間音声に対する自動音声認識(ASR)技術の適用において、近年のAI技術を用いた事例では、新しい語や話題を低コストで効率的に習得する手法が不足しており、全自動での字幕生成等では実用的な認識精度がまだ達成されていない。本研究課題では、リアルタイム性を重視して新しい語のテキスト情報のみを手動で提供する枠組みを提案し、ASR技術を基にした自動字幕や検索の精度を低コストで改善させる手法を提案した。これにより、ASR技術の応用可能性を一段と高められることを実証した。
|