2017 Fiscal Year Final Research Report
Research on construction and application of high discriminative speech feature space using heterogeneous speech units and multiple languages
Project/Area Number |
15K00262
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Perceptual information processing
|
Research Institution | National Institute of Advanced Industrial Science and Technology |
Principal Investigator |
Lee Shi-wook 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (50415642)
|
Co-Investigator(Kenkyū-buntansha) |
伊藤 慶明 岩手県立大学, ソフトウェア情報学部, 教授 (90325928)
|
Project Period (FY) |
2015-10-21 – 2018-03-31
|
Keywords | 音声情報処理 / パターン認識 / ヒューマンインタフェース / 時系列解析 / 統計的パターン認識 / 情報検索 / 多変量解析 / 知能情報処理 |
Outline of Final Research Achievements |
This research aims to improve speech recognition performance by enhanced discriminative ability on speech feature space using heterogeneous information. Due to most speech recognition systems by recent deep learning techniques are constructed on the basis of a single speech unit, speech diversity cannot be sufficiently modeled even with enormous speech data. As a solution to the problem, we adopt a sub-phonetic segment unit which is a temporal extension speech unit and is completely different from the conventional contextual dependent speech unit. We confirmed that the proposed high discriminative speech feature space based on heterogeneous speech units is effective on a wide range of speech recognition systems; from conventional generation models to leading-edge deep learning models.
|
Free Research Field |
情報学
|