Project/Area Number |
16200011
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Kyoto University |
Principal Investigator |
KAWAHARA Tatsuya Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)
|
Co-Investigator(Kenkyū-buntansha) |
MINOH Michihiko Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (70166099)
FURUI Sadaoki Tokyo Institute of Technology, Department of Computer Science, Professor, 情報理工学研究科, 教授 (90293076)
AKITA Yuya Academic Center for Computing and Media Studies, Assistant Professor, 学術情報メディアセンター, 助手 (90402742)
|
Project Period (FY) |
2004 – 2006
|
Project Status |
Completed (Fiscal Year 2006)
|
Budget Amount *help |
¥45,110,000 (Direct Cost: ¥34,700,000、Indirect Cost: ¥10,410,000)
Fiscal Year 2006: ¥12,220,000 (Direct Cost: ¥9,400,000、Indirect Cost: ¥2,820,000)
Fiscal Year 2005: ¥14,170,000 (Direct Cost: ¥10,900,000、Indirect Cost: ¥3,270,000)
Fiscal Year 2004: ¥18,720,000 (Direct Cost: ¥14,400,000、Indirect Cost: ¥4,320,000)
|
Keywords | Speech processing / Speech recognition / Speech archives / Spontaneous speech / Meta-data extraction / Oral presentation / Lecture / Discussion |
Research Abstract |
We investigated automatic speech recognition and post-processing of the transcripts of oral presentations at academic meetings, lectures at universities, and discussions on TV programs and parliaments. In these kinds of spontaneous speech, there is a large variation in pronunciation and a variety of colloquial expressions. Thus, elaborate modeling and robust statistical training are necessary for these issues. Moreover, since there is a wide variety of topics and vocabularies, it is also necessary to adapt the lexicon and language model to each lecture or discussion. For this purpose, we conducted the following studies. -Generalized statistical modeling of pronunciation variation -Transformation of language model to spoken-style based on statistical machine translation framework -Language model adaptation based on PLSA of topics and speakers -Language model adaptation using slide information -Language model adaptation based on topic segmentation of meetings The transcripts (speech recognition results) of spontaneous speech are not appropriate for archiving as they are. It is necessary to clean disfluencies and colloquial expressions and to mark sentence boundaries. Thus, we conducted the following studies. -Sentence boundary detection using dependency structure analysis -Clause boundary detection using local syntactic dependency -Detection of quotations and inserted clauses -Detection and correction of self-repairs We also conducted following studies for effective indexing of speech archives. -Indexing of key sentences in oral presentations -Alignment of utterances with slides used in lectures
|