2017 Fiscal Year Final Research Report

Research on construction and application of high discriminative speech feature space using heterogeneous speech units and multiple languages

Research Project

PDF

Project/Area Number	15K00262
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	National Institute of Advanced Industrial Science and Technology
Principal Investigator	Lee Shi-wook 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (50415642)
Co-Investigator(Kenkyū-buntansha)	伊藤慶明岩手県立大学, ソフトウェア情報学部, 教授 (90325928)
Project Period (FY)	2015-10-21 – 2018-03-31
Keywords	音声情報処理 / パターン認識 / ヒューマンインタフェース / 時系列解析 / 統計的パターン認識 / 情報検索 / 多変量解析 / 知能情報処理
Outline of Final Research Achievements	This research aims to improve speech recognition performance by enhanced discriminative ability on speech feature space using heterogeneous information. Due to most speech recognition systems by recent deep learning techniques are constructed on the basis of a single speech unit, speech diversity cannot be sufficiently modeled even with enormous speech data. As a solution to the problem, we adopt a sub-phonetic segment unit which is a temporal extension speech unit and is completely different from the conventional contextual dependent speech unit. We confirmed that the proposed high discriminative speech feature space based on heterogeneous speech units is effective on a wide range of speech recognition systems; from conventional generation models to leading-edge deep learning models.
Free Research Field	情報学