Designing an ultra-hispeed search engine for big data of spoken documents
Project/Area Number |
22300060
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Waseda University (2012-2013) Toyohashi University of Technology (2010-2011) |
Principal Investigator |
NITTA Tsuneo 早稲田大学, グリーンコンピューティングシステム研究機構, 教授 (70314101)
|
Co-Investigator(Kenkyū-buntansha) |
KOUICHI Katsurada 豊橋技術科学大学, 国際交流センター, 准教授 (80324490)
入部 百合絵 豊橋技術科学大学, 情報メディア基盤センター, 助教 (40397500)
|
Co-Investigator(Renkei-kenkyūsha) |
YURIE Iribe 愛知県立大, 科学情報科学部, 助教 (40397500)
|
Project Period (FY) |
2010-04-01 – 2013-03-31
|
Project Status |
Completed (Fiscal Year 2013)
|
Budget Amount *help |
¥13,780,000 (Direct Cost: ¥10,600,000、Indirect Cost: ¥3,180,000)
Fiscal Year 2012: ¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2011: ¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2010: ¥4,940,000 (Direct Cost: ¥3,800,000、Indirect Cost: ¥1,140,000)
|
Keywords | 音声情報処理 / ビッグデータ / 高速検索 / サフィックス・アレー / 音素認識 / 調音素性抽出 / ディープニューラルネット / 双対空間 / 音声ドキュメント検索 / 大規模音声ドキュメント / 高精度音素認識 / 調音特徴 / サフィックスアレー / キーワード分割 |
Research Abstract |
Fast spoken term detection from big data has been developed. In the development, we have focused on (1) accurate speech-to-phoneme conversion even though there is an out-of-vocabulary word in an utterance, and (2) fast spoken term detection even though there is ambiguity caused by speech recognition errors. In (1), after extracting features of phonemes in a dual space, a multi-layer perceptron extracts articulatory features, then a subsequent phoneme classifier discriminates phonemes with high accuracy. In (2), we have implemented an iterative deepening search based on suffix arrays, a continuous DP matching, and a keyword division algorithm. As a result, we could solve three issues of search accuracy, search speed, and size of index at the same time.
|
Report
(4 results)
Research Products
(88 results)