2015 Fiscal Year Final Research Report

Improvement of large vocabulary speech recognition performance based on high-precision lexical prosody prediction

Research Project

PDF

Project/Area Number	25540064
Research Category	Grant-in-Aid for Challenging Exploratory Research
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	The University of Tokyo
Principal Investigator	Minematsu Nobuaki 東京大学, 工学(系)研究科(研究院), 教授 (90273333)
Project Period (FY)	2013-04-01 – 2016-03-31
Keywords	音声認識 / 韻律的特徴 / アクセント句境界 / アクセント核位置 / リランキング / Average perceptron / CRF / 構造的表象
Outline of Final Research Achievements	Japanese has unique characteristics where lexical prosody often vary when words are combined together. In speech recognition research, re-ranking is often used to re-evaluate multiple recognition hypotheses generated from a recognizer and determine the final one. In re-ranking, it is expected that, by comparing lexical prosody predicted from each of the hypotheses and that estimated from an input utterance, better re-ranking is made possible. We implemented successfully 1) lexical prosody prediction from hypotheses and 2) re-ranking of hypotheses based on lexical prosody but it was found to be extremely difficult to build a module that can estimate lexical prosody information precisely only from an utterance. Then, we turned into another strategy of applying quasi-prosody to re-ranking. In the new strategy, structural features are predicted from hypotheses and are also estimated from an input utterance. Experiments showed a high effectiveness of structural re-ranking.
Free Research Field	音声科学・音声工学