2018 Fiscal Year Final Research Report

A study on acoustic model adaptation for deep-learning-based speech recognition

Research Project

PDF

Project/Area Number	16K00227
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Yamagata University
Principal Investigator	Kosaka Tetsuo 山形大学, 大学院理工学研究科, 教授 (50359569)
Research Collaborator	KATO Masaharu
Project Period (FY)	2016-04-01 – 2019-03-31
Keywords	音声認識 / 音響モデル / ディープニューラルネットワーク / 適応技術 / 話し言葉 / 感情音声 / 音声区間検出
Outline of Final Research Achievements	Although the deep-learning-based speech recognition technology has made great achievements in recent years, the spontaneous-speech-recognition technology has not yet obtained sufficient results. As major factors of performance degradation in speech recognition, a variety of speaker characteristics, acoustic environments, and speaking styles can be mentioned. To solve these problems, I developed techniques centered around acoustic-model adaptation to improve the speech-recognition performance. Consequently, performance improvement was achieved with regard to spontaneous and emotional speech. Additionally, the performance of voice-activity detection was also improved.
Free Research Field	音声情報処理
Academic Significance and Societal Importance of the Research Achievements	本研究により，1)話し言葉音声認識における適応精度の向上，2)雑音下音声区間検出の精度向上，3)感情音声認識の性能向上を達成した．1)は話し言葉音声認識に限らず，異なる分野においても応用可能な適応手法で汎用性の高い技術である．2)の成果を利用してマルチモーダル対話コーパスが整備されており，当該分野の研究者にとって有益と考えられる．また3)についてもロボットと人間との会話など様々な分野に利用が可能である．以上，本研究で開発した技術は波及効果が高く，学術的，社会的意義が高いと考えられる．