Improvement of large vocabulary speech recognition performance based on high-precision lexical prosody prediction
Project/Area Number |
25540064
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Perceptual information processing
|
Research Institution | The University of Tokyo |
Principal Investigator |
Minematsu Nobuaki 東京大学, 工学(系)研究科(研究院), 教授 (90273333)
|
Project Period (FY) |
2013-04-01 – 2016-03-31
|
Project Status |
Completed (Fiscal Year 2015)
|
Budget Amount *help |
¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000)
Fiscal Year 2014: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Fiscal Year 2013: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
|
Keywords | 音声認識 / 韻律的特徴 / アクセント句境界 / アクセント核位置 / リランキング / Average perceptron / CRF / 構造的表象 / 仮説探索 / アクセント核 |
Outline of Final Research Achievements |
Japanese has unique characteristics where lexical prosody often vary when words are combined together. In speech recognition research, re-ranking is often used to re-evaluate multiple recognition hypotheses generated from a recognizer and determine the final one. In re-ranking, it is expected that, by comparing lexical prosody predicted from each of the hypotheses and that estimated from an input utterance, better re-ranking is made possible. We implemented successfully 1) lexical prosody prediction from hypotheses and 2) re-ranking of hypotheses based on lexical prosody but it was found to be extremely difficult to build a module that can estimate lexical prosody information precisely only from an utterance. Then, we turned into another strategy of applying quasi-prosody to re-ranking. In the new strategy, structural features are predicted from hypotheses and are also estimated from an input utterance. Experiments showed a high effectiveness of structural re-ranking.
|
Report
(4 results)
Research Products
(10 results)