Project/Area Number |
12480083
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
FURUI Sadaoki Tokyo Institute of Technology, Graduate School of Information Sci. & Eng., Department of Computer Science, Professor, 大学院・情報理工学研究科, 教授 (90293076)
|
Co-Investigator(Kenkyū-buntansha) |
IWANO Koji Tokyo Institute of Technology, Assistant Professor, Graduate School of Information Sci. & Eng., 大学院・情報理工学研究科, 助手 (90323823)
|
Project Period (FY) |
2000 – 2002
|
Project Status |
Completed (Fiscal Year 2002)
|
Budget Amount *help |
¥2,900,000 (Direct Cost: ¥2,900,000)
Fiscal Year 2002: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2001: ¥2,100,000 (Direct Cost: ¥2,100,000)
|
Keywords | Ubiquitous / wearable computing environment / Computer-supported meeting system / Parallel computer / Spoken dialogue / Speech recognition system / Speech contents / 話者適応 / 会議CSCWシステム / 音声認識 / Ubiquitous / Wearable Computing / 話し言葉 / 音響バックオフ / 対話型システム / モデル学習 |
Research Abstract |
Research to build speech recognition technology far computer-supported conference systems under ubiquitous/wearable computing environment has been conducted. First, methods for building both language models appropriate to spontaneous speech and acoustic models automatically adapted to voice individuality have been investigated. Since the cross-talk problem cannot be avoided even if a microphone is attached to each participant of meetings or discussions, an acoustic backing-off method has been tried. In this method, the acoustic score during the cross-talk period is replaced by a mean value for a previous speech Period. The proposed method was confirmed to be effective to improve the recognition Performance. Second, a parallel committer-based speech recognition system consisting of multiple recognizers having acoustic models adapted to each speaker has been built to recognize meeting utterances. Speaker change is automatically detected during the meeting, and acoustic models are adapted using an unsupervised method. For a new speaker, a speaker-adapted model is incrementally created. A speech recognition result having the maximum likelihood is chosen from the results of multiple recognizers using the speaker-adapted acoustic models. It was confirmed that this method is effective to build a real-time reformation system with a good performance. Third, the parallel computer-based speech recognition system has been applied to a mixed- initiative spoken dialogue system accepting multiple topics in parallel. Effectiveness of the system was also confirmed. Various other issues related to spontaneous speech recognition have also been an investigated in this research.
|