A Spoken Language Knowledge Expansion Framework for Real-World Speech Recognition Using Deep Learning Technology and Human Collaboration
Project/Area Number |
18K11431
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | Shizuoka University |
Principal Investigator |
Kai Atsuhiko 静岡大学, 工学部, 准教授 (60283496)
|
Project Period (FY) |
2018-04-01 – 2023-03-31
|
Project Status |
Completed (Fiscal Year 2022)
|
Budget Amount *help |
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2020: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2018: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
|
Keywords | 自動音声認識 / 深層学習 / 長時間録音 / 自動修正 / 音声検索語検出 / 読み推定 / End-to-end型 / リアルタイム / 長時間収録 / 自動字幕 / End-to-end型音声認識 / 長時間収録音声 / End-to-end型音声認識モデル / 環境雑音 / 音声認識誤り修正 / 低コスト字幕修正 / 回り込み音声 / End-to-end音声認識モデル / 注目話者音声認識 / 話者音声分離 / 音声区間検出 / 言語知識拡充 / ディープニューラルネットワーク(DNN) / 情報保障 / 講義音声 / 半自動学習 |
Outline of Final Research Achievements |
In order to make automatic speech recognition (ASR) technology applicable to long-term automatic subtitling and retrieval, we developed ASR-related technology that enables the expansion of spoken language knowledge, such as new technical terms, at a low cost. Specifically, we constructed an ASR system that can output in real-time, and realized a semi-automatic correction support system in which users do not directly edit the output text, but only input the corrected words. Regarding the spoken term detection technique used to obtain the timing at which the corrected word appears in the recording, an end-to-end ASR model that infers the reading of speech was used to improve the detection accuracy for unknown words, which are common among misrecognized words. In addition, speaker separation and voice activity detection methods were developed for noisy and multi-speaker speech, and their effectiveness was confirmed.
|
Academic Significance and Societal Importance of the Research Achievements |
講義や会議などの長時間音声に対する自動音声認識(ASR)技術の適用において、近年のAI技術を用いた事例では、新しい語や話題を低コストで効率的に習得する手法が不足しており、全自動での字幕生成等では実用的な認識精度がまだ達成されていない。本研究課題では、リアルタイム性を重視して新しい語のテキスト情報のみを手動で提供する枠組みを提案し、ASR技術を基にした自動字幕や検索の精度を低コストで改善させる手法を提案した。これにより、ASR技術の応用可能性を一段と高められることを実証した。
|
Report
(6 results)
Research Products
(17 results)