Project/Area Number |
15300026
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Media informatics/Database
|
Research Institution | University of Tsukuba |
Principal Investigator |
TANAKA Kazuyo University of Tsukuba, Graduate School of Library, Information and Media Studies, Professor, 大学院・図書館情報メディア研究科, 教授 (70344207)
|
Co-Investigator(Kenkyū-buntansha) |
ITOH Yoshiaki Iwate Prefectural University, Faculty of Software and Information Science, Associate Professor, ソフトウエア情報学部, 助教授 (90325928)
OKAWA Shigeki Chiba Institute of Technology, Dept.of Information and Network Science, Associate Professor, 情報科学部, 助教授 (40306395)
KOJIMA Hiroaki National Institute of Advanced Industrial Science and Technology, Research Group Leader, 情報技術研究部門, グループリーダ (80356980)
|
Project Period (FY) |
2003 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥16,500,000 (Direct Cost: ¥16,500,000)
Fiscal Year 2005: ¥4,700,000 (Direct Cost: ¥4,700,000)
Fiscal Year 2004: ¥5,200,000 (Direct Cost: ¥5,200,000)
Fiscal Year 2003: ¥6,600,000 (Direct Cost: ¥6,600,000)
|
Keywords | speech recognition / spoken document retrieval / phonetic code / IPA / Dynamic Programming / phone model / multilingual / open vocabulary / 汎用音声符号 / 音声音響モデル / 音声要約 |
Research Abstract |
In this project, we present a novel speech processing framework, where all of the acoustic speech samples are once encoded into universal phonetic segment (UPS) sequences and spoken document processing (SDP) systems, such as recognition, retrieval, indexing, are constructed on this UPS domain. Adopting this framework, the SDP systems are separated from the original acoustic correlates or environments. This makes it possible to realize such flexibility that recognition-type processing can be handled by just calculating distances between UPS sequences, and also can be constructed on distributed processing schemes. Through this project, we have developed the following component techniques on this framework : 1)an original fine sub-phonetic segment (SPS) set as the UPS set, which brought high performance recognition and easy processing of multilingual speech, 2)effective DP(dynamic programming)-based sequence matching algorithms, called Shift CDP and Relay CDP. Effectiveness of the processing framework, the SPS set, and DP-based algorithms are evaluated by constructing speech recognition and open vocabulary spoken document retrieval (SDR) systems. Experimental results showed that the proposed SDP systems are superior to those based on conventional methods in performance evaluation. We have finally constructed a real time open vocabulary SDR system for demonstration, in which the system can retrieve broadcast video by user's speech.
|