2015 Fiscal Year Final Research Report
Improvement of large vocabulary speech recognition performance based on high-precision lexical prosody prediction
Project/Area Number |
25540064
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Perceptual information processing
|
Research Institution | The University of Tokyo |
Principal Investigator |
Minematsu Nobuaki 東京大学, 工学(系)研究科(研究院), 教授 (90273333)
|
Project Period (FY) |
2013-04-01 – 2016-03-31
|
Keywords | 音声認識 / 韻律的特徴 / アクセント句境界 / アクセント核位置 / リランキング / Average perceptron / CRF / 構造的表象 |
Outline of Final Research Achievements |
Japanese has unique characteristics where lexical prosody often vary when words are combined together. In speech recognition research, re-ranking is often used to re-evaluate multiple recognition hypotheses generated from a recognizer and determine the final one. In re-ranking, it is expected that, by comparing lexical prosody predicted from each of the hypotheses and that estimated from an input utterance, better re-ranking is made possible. We implemented successfully 1) lexical prosody prediction from hypotheses and 2) re-ranking of hypotheses based on lexical prosody but it was found to be extremely difficult to build a module that can estimate lexical prosody information precisely only from an utterance. Then, we turned into another strategy of applying quasi-prosody to re-ranking. In the new strategy, structural features are predicted from hypotheses and are also estimated from an input utterance. Experiments showed a high effectiveness of structural re-ranking.
|
Free Research Field |
音声科学・音声工学
|