• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2015 Fiscal Year Final Research Report

Improvement of large vocabulary speech recognition performance based on high-precision lexical prosody prediction

Research Project

  • PDF
Project/Area Number 25540064
Research Category

Grant-in-Aid for Challenging Exploratory Research

Allocation TypeMulti-year Fund
Research Field Perceptual information processing
Research InstitutionThe University of Tokyo

Principal Investigator

Minematsu Nobuaki  東京大学, 工学(系)研究科(研究院), 教授 (90273333)

Project Period (FY) 2013-04-01 – 2016-03-31
Keywords音声認識 / 韻律的特徴 / アクセント句境界 / アクセント核位置 / リランキング / Average perceptron / CRF / 構造的表象
Outline of Final Research Achievements

Japanese has unique characteristics where lexical prosody often vary when words are combined together. In speech recognition research, re-ranking is often used to re-evaluate multiple recognition hypotheses generated from a recognizer and determine the final one. In re-ranking, it is expected that, by comparing lexical prosody predicted from each of the hypotheses and that estimated from an input utterance, better re-ranking is made possible. We implemented successfully 1) lexical prosody prediction from hypotheses and 2) re-ranking of hypotheses based on lexical prosody but it was found to be extremely difficult to build a module that can estimate lexical prosody information precisely only from an utterance. Then, we turned into another strategy of applying quasi-prosody to re-ranking. In the new strategy, structural features are predicted from hypotheses and are also estimated from an input utterance. Experiments showed a high effectiveness of structural re-ranking.

Free Research Field

音声科学・音声工学

URL: 

Published: 2017-05-10  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi