Grant-in-Aid for Scientific Research (B).
|Research Institution||Science University of Tokyo|
FUJISAKI Hiroya Science University of Tokyo, Dept. of Applied Electronics Professor, 基礎工学部, 教授 (80010776)
ITOH Kohji Science University of Tokyo, Dept. of Applied Electronics Professor, 基礎工学部, 教授 (20013683)
HARADA Tetsuya Science University of Tokyo, Dept. of Applied Electronics Lecturer, 基礎工学部, 講師 (80189703)
HIROSE Keikichi University of Tokyo, Dept. of Electronic Engineering Associate Professor, 工学部, 助教授 (50111472)
|Project Fiscal Year
1991 – 1992
Completed(Fiscal Year 1992)
|Budget Amount *help
¥7,000,000 (Direct Cost : ¥7,000,000)
Fiscal Year 1992 : ¥1,600,000 (Direct Cost : ¥1,600,000)
Fiscal Year 1991 : ¥5,400,000 (Direct Cost : ¥5,400,000)
|Keywords||Spoken Language / Human Processes of Recognition / Large Context / Continuous Speech / Speech Recognition System / Syntactic Information / Semantic Information / Discourse Information / 音声言語 / 人間の認識過程 / 大文脈 / 連続音声 / 音声認識方式 / 統語情報 / 意味情報 / 談話情報 / 認識過程 / 人間 / 内部辞書 / 辞書検索|
Most of the current systems for automatic speech recognition fail to achieve recognition performance comparable to human listeners, since they are constructed without paying attention to the human processes of spoken language recognition. From this point of view, the present study investigates the human processes and incorporates the findings into a scheme for automatic recognition of continuous speech in a large context. The followings are the main results:
1. Experimental investigation and modeling of the human processes of spoken language recognition
Using as stimuli natural utterances with controlled acoustic, syntactic and semantic information, the following findings were obtained on the human processes of spoken language recognition.
(1) The unit of speech recognition varies widely from phones and syllables to words and phrases depending on the experimental condition and context.
(2) Larger units generally require less accuracy of representation for correct recognition.
(3) The amount
of acoustic information necessary for recognition of a given unit varies widely depending on the size of context and prior knowledge on the part of the listener.
(4) The accuracy and speed of access to mental lexicon varies dynamically depending on the acoustic, syntactic, semantic and discourse information available to the listener.
Based on these findings, a model has been constructed for the human processes of spoken language recognition.
2. Proposal and implementation of a scheme for automatic recognition of spoken language recognition
Based upon the above findings and the model, a scheme for automatic recognition of continuous speech in a large context has been proposed, featuring (1) use of multiple size units and accuracy of acoustic feature representation, (2) use of prosodic features for word and phrase boundary detection, (3) extraction of syntactic, sematic, and idiosyncratic information from a large context. The main components of the system have been implemented.
3. Demonstration of the validity of the proposed scheme
The proposed scheme has been tested by recognition experiments of phones, syllables and words in continuous speech with a large context, and the results have confirmed the essential validity and feasibility of the proposed scheme. Less