2018 Fiscal Year Final Research Report
A study on acoustic model adaptation for deep-learning-based speech recognition
Project/Area Number |
16K00227
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Perceptual information processing
|
Research Institution | Yamagata University |
Principal Investigator |
Kosaka Tetsuo 山形大学, 大学院理工学研究科, 教授 (50359569)
|
Research Collaborator |
KATO Masaharu
|
Project Period (FY) |
2016-04-01 – 2019-03-31
|
Keywords | 音声認識 / 音響モデル / ディープニューラルネットワーク / 適応技術 / 話し言葉 / 感情音声 / 音声区間検出 |
Outline of Final Research Achievements |
Although the deep-learning-based speech recognition technology has made great achievements in recent years, the spontaneous-speech-recognition technology has not yet obtained sufficient results. As major factors of performance degradation in speech recognition, a variety of speaker characteristics, acoustic environments, and speaking styles can be mentioned. To solve these problems, I developed techniques centered around acoustic-model adaptation to improve the speech-recognition performance. Consequently, performance improvement was achieved with regard to spontaneous and emotional speech. Additionally, the performance of voice-activity detection was also improved.
|
Free Research Field |
音声情報処理
|
Academic Significance and Societal Importance of the Research Achievements |
本研究により,1)話し言葉音声認識における適応精度の向上,2)雑音下音声区間検出の精度向上,3)感情音声認識の性能向上を達成した.1)は話し言葉音声認識に限らず,異なる分野においても応用可能な適応手法で汎用性の高い技術である.2)の成果を利用してマルチモーダル対話コーパスが整備されており,当該分野の研究者にとって有益と考えられる.また3)についてもロボットと人間との会話など様々な分野に利用が可能である.以上,本研究で開発した技術は波及効果が高く,学術的,社会的意義が高いと考えられる.
|