A scheme for continuous speech recognition in a large context based on the human process of spoken language recognition

Research Project

Project/Area Number	03452164
Research Category	Grant-in-Aid for General Scientific Research (B)
Allocation Type	Single-year Grants
Research Field	情報工学
Research Institution	Science University of Tokyo
Principal Investigator	FUJISAKI Hiroya Science University of Tokyo, Dept. of Applied Electronics Professor, 基礎工学部, 教授 (80010776)
Co-Investigator(Kenkyū-buntansha)	HARADA Tetsuya Science University of Tokyo, Dept. of Applied Electronics Lecturer, 基礎工学部, 講師 (80189703) ITOH Kohji Science University of Tokyo, Dept. of Applied Electronics Professor, 基礎工学部, 教授 (20013683) HIROSE Keikichi University of Tokyo, Dept. of Electronic Engineering Associate Professor, 工学部, 助教授 (50111472)
Project Period (FY)	1991 – 1992
Project Status	Completed (Fiscal Year 1992)
Budget Amount *help	¥7,000,000 (Direct Cost: ¥7,000,000) Fiscal Year 1992: ¥1,600,000 (Direct Cost: ¥1,600,000) Fiscal Year 1991: ¥5,400,000 (Direct Cost: ¥5,400,000)
Keywords	Spoken Language / Human Processes of Recognition / Large Context / Continuous Speech / Speech Recognition System / Syntactic Information / Semantic Information / Discourse Information / 認識過程 / 人間 / 内部辞書 / 辞書検索
Research Abstract	Most of the current systems for automatic speech recognition fail to achieve recognition performance comparable to human listeners, since they are constructed without paying attention to the human processes of spoken language recognition. From this point of view, the present study investigates the human processes and incorporates the findings into a scheme for automatic recognition of continuous speech in a large context. The followings are the main results: 1. Experimental investigation and modeling of the human processes of spoken language recognition Using as stimuli natural utterances with controlled acoustic, syntactic and semantic information, the following findings were obtained on the human processes of spoken language recognition. (1) The unit of speech recognition varies widely from phones and syllables to words and phrases depending on the experimental condition and context. (2) Larger units generally require less accuracy of representation for correct recognition. (3) The amount … More of acoustic information necessary for recognition of a given unit varies widely depending on the size of context and prior knowledge on the part of the listener. (4) The accuracy and speed of access to mental lexicon varies dynamically depending on the acoustic, syntactic, semantic and discourse information available to the listener. Based on these findings, a model has been constructed for the human processes of spoken language recognition. 2. Proposal and implementation of a scheme for automatic recognition of spoken language recognition Based upon the above findings and the model, a scheme for automatic recognition of continuous speech in a large context has been proposed, featuring (1) use of multiple size units and accuracy of acoustic feature representation, (2) use of prosodic features for word and phrase boundary detection, (3) extraction of syntactic, sematic, and idiosyncratic information from a large context. The main components of the system have been implemented. 3. Demonstration of the validity of the proposed scheme The proposed scheme has been tested by recognition experiments of phones, syllables and words in continuous speech with a large context, and the results have confirmed the essential validity and feasibility of the proposed scheme. Less

Report

(3 results)

1992 Annual Research Report Final Research Report Summary
1991 Annual Research Report

Research Products
(20 results)

All Other

All Publications (20 results)

[Publications] 藤崎博也: "音声認識における音響的特徴表現の時間単位に関する検討" 日本音響学会平成3年秋季研究発表会講演論文集. 1. 153-154 (1991)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] 峯松,信明: "複数の時間的単位・精度の音響的特徴表現を用いた音声認識" 日本音響学会平成4年春季研究発表会講演論文集. 1. 31-32 (1992)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] 大野澄雄: "連続音声の語句の照合における種々のレベルの辞書情報の利用" 日本音響学会平成4年春季研究発表会講演論文集. 1. 95-96 (1992)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Fujisaki Hiroya: "The influence of semantic and syntactic information on spoken sentence recognition" Proceedings of the 1992 International Conference on Spoken Lnaguage Processing. 1. 153-156 (1992)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] 峯松信明: "連続音声知覚における高次言語情報の及ぼす影響" 日本音響学会聴覚研究会資料. H-92-56. 1-6 (1992)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Fujisaki Hiroya: "A scheme for automatic recognition of continuous speech in a large context based on human processes of spoken language recognition" Proceedins of EUROSPEECH 93. (1993)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Hiroya Fujisaki: "A study on the size of the temporal unit for representing the acoustic features in automatic speech recognition" Reports of 1991 Autumn Meeting of the Acoustical Society of Japan. vol. 1. 153-154 (1991)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Nobuaki Minematsu: "Automatic speech recognition using multiple temporal units and accuracy of representation for the acoustic features" Reports of 1992 Spring Meeting of the Acoustical Society of Japan. vol. 1. 31-32 (1992)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Sumio Ohno: "Utilization of lexical information at multiple levels in template matching of words and phrases in continuous speech" Reports of 1992 Spring Meeting of the Acoustical Society of Japan. vol. 1. 95-96 (1992)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Hiroya Fujisaki: "The influence of sematic and syntactic information on spoken sentence recognition" Proceedings of the 1992 International Conference on Spoken Language Processing. vol. 1. 153-156 (1992)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Nobuaki Minematsu: "The influence of higher-level linguistic information on continuous speech perception" Transactions of Committee on Hearing Research, Acoustical Society of Japan. vol. H-92, no. 56. (1992)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Hiroya Fujisaki: "A scheme for automatic recognition of continuous speech in a large context based on human processes of spoken language processing" Proceedings of EUROSPEECH 93, Berlin. (1993)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1992 Final Research Report Summary
[Publications] Fujisaki,Hiroya: "The influence of semantic and syntactic information on spoken sentence recognition" Proceedings of the 1992 International Conference on Spoken Language Processing. 1. 153-156 (1992)
- Related Report
  1992 Annual Research Report
[Publications] 峯松,信明: "連続音声知覚における高次言語情報の及ぼす影響" 日本音響学会聴覚研究会資料. H-92-56. 1-6 (1992)
- Related Report
  1992 Annual Research Report
[Publications] 峯松,信明: "意味的内容が音声知覚過程に及ぼす影響に関する実験的検討" 日本音響学会秋季研究発表会講演論文集. 1. (1992)
- Related Report
  1992 Annual Research Report
[Publications] Fujisaki,Hiroya: "A scheme for automatic recognition of continuous speech in a large context based on human processes of spoken language recognition" Proceedins of EUROSPEECH 93. (1993)
- Related Report
  1992 Annual Research Report
[Publications] 藤崎博也: "音声認識における音響的特徴表現の時間単位に関する検討" 日本音響学会平成3年秋季研究発表会講演論文集. 1. 153-154 (1991)
- Related Report
  1991 Annual Research Report
[Publications] 峯松信明: "複数の時間単位・精度の音響的特徴表現を用いた音声認識" 日本音響学会平成4年春季研究発表会講演論文集. 1. 31-32 (1992)
- Related Report
  1991 Annual Research Report
[Publications] 大野澄雄: "連続音声の語句の照合における種々のレベルの辞書情報の利用" 日本音響学会平成4年春季研究発表会講演論文集. 1. 95-96 (1992)
- Related Report
  1991 Annual Research Report
[Publications] Fujisaki,H.: "A method for automatic speech recognition based on findings of the human process of speech perception" Proceedings of the 1992 International Conference on Spoken Language Processing (Banff,Canada). (1992)
- Related Report
  1991 Annual Research Report

A scheme for continuous speech recognition in a large context based on the human process of spoken language recognition

Principal Investigator

FUJISAKI Hiroya Science University of Tokyo, Dept. of Applied Electronics Professor, 基礎工学部, 教授 (80010776)

¥7,000,000 (Direct Cost: ¥7,000,000)

Report

Research Products

[Publications] 藤崎 博也: "音声認識における音響的特徴表現の時間単位に関する検討" 日本音響学会平成3年秋季研究発表会講演論文集. 1. 153-154 (1991)

Description

Related Report

[Publications] 峯松,信明: "複数の時間的単位・精度の音響的特徴表現を用いた音声認識" 日本音響学会平成4年春季研究発表会講演論文集. 1. 31-32 (1992)

Description

Related Report

[Publications] 大野 澄雄: "連続音声の語句の照合における種々のレベルの辞書情報の利用" 日本音響学会平成4年春季研究発表会講演論文集. 1. 95-96 (1992)

Description

Related Report

[Publications] Fujisaki Hiroya: "The influence of semantic and syntactic information on spoken sentence recognition" Proceedings of the 1992 International Conference on Spoken Lnaguage Processing. 1. 153-156 (1992)

Description

Related Report

[Publications] 峯松 信明: "連続音声知覚における高次言語情報の及ぼす影響" 日本音響学会聴覚研究会資料. H-92-56. 1-6 (1992)

Description

Related Report

[Publications] Fujisaki Hiroya: "A scheme for automatic recognition of continuous speech in a large context based on human processes of spoken language recognition" Proceedins of EUROSPEECH 93. (1993)

Description

Related Report

[Publications] Hiroya Fujisaki: "A study on the size of the temporal unit for representing the acoustic features in automatic speech recognition" Reports of 1991 Autumn Meeting of the Acoustical Society of Japan. vol. 1. 153-154 (1991)

Description

Related Report

[Publications] Nobuaki Minematsu: "Automatic speech recognition using multiple temporal units and accuracy of representation for the acoustic features" Reports of 1992 Spring Meeting of the Acoustical Society of Japan. vol. 1. 31-32 (1992)

Description

Related Report

[Publications] Sumio Ohno: "Utilization of lexical information at multiple levels in template matching of words and phrases in continuous speech" Reports of 1992 Spring Meeting of the Acoustical Society of Japan. vol. 1. 95-96 (1992)

Description

Related Report

[Publications] Hiroya Fujisaki: "The influence of sematic and syntactic information on spoken sentence recognition" Proceedings of the 1992 International Conference on Spoken Language Processing. vol. 1. 153-156 (1992)

Description

Related Report

[Publications] Nobuaki Minematsu: "The influence of higher-level linguistic information on continuous speech perception" Transactions of Committee on Hearing Research, Acoustical Society of Japan. vol. H-92, no. 56. (1992)

Description

Related Report

[Publications] Hiroya Fujisaki: "A scheme for automatic recognition of continuous speech in a large context based on human processes of spoken language processing" Proceedings of EUROSPEECH 93, Berlin. (1993)

Description

Related Report

[Publications] Fujisaki,Hiroya: "The influence of semantic and syntactic information on spoken sentence recognition" Proceedings of the 1992 International Conference on Spoken Language Processing. 1. 153-156 (1992)

Related Report

[Publications] 峯松,信明: "連続音声知覚における高次言語情報の及ぼす影響" 日本音響学会聴覚研究会資料. H-92-56. 1-6 (1992)

Related Report

[Publications] 峯松,信明: "意味的内容が音声知覚過程に及ぼす影響に関する実験的検討" 日本音響学会秋季研究発表会講演論文集. 1. (1992)

Related Report

[Publications] Fujisaki,Hiroya: "A scheme for automatic recognition of continuous speech in a large context based on human processes of spoken language recognition" Proceedins of EUROSPEECH 93. (1993)

Related Report

[Publications] 藤崎 博也: "音声認識における音響的特徴表現の時間単位に関する検討" 日本音響学会平成3年秋季研究発表会講演論文集. 1. 153-154 (1991)

Related Report

[Publications] 峯松 信明: "複数の時間単位・精度の音響的特徴表現を用いた音声認識" 日本音響学会平成4年春季研究発表会講演論文集. 1. 31-32 (1992)

Related Report

[Publications] 大野 澄雄: "連続音声の語句の照合における種々のレベルの辞書情報の利用" 日本音響学会平成4年春季研究発表会講演論文集. 1. 95-96 (1992)

Related Report

[Publications] Fujisaki,H.: "A method for automatic speech recognition based on findings of the human process of speech perception" Proceedings of the 1992 International Conference on Spoken Language Processing (Banff,Canada). (1992)

Related Report

[Publications] 藤崎博也: "音声認識における音響的特徴表現の時間単位に関する検討" 日本音響学会平成3年秋季研究発表会講演論文集. 1. 153-154 (1991)

[Publications] 大野澄雄: "連続音声の語句の照合における種々のレベルの辞書情報の利用" 日本音響学会平成4年春季研究発表会講演論文集. 1. 95-96 (1992)

[Publications] 峯松信明: "連続音声知覚における高次言語情報の及ぼす影響" 日本音響学会聴覚研究会資料. H-92-56. 1-6 (1992)

[Publications] 藤崎博也: "音声認識における音響的特徴表現の時間単位に関する検討" 日本音響学会平成3年秋季研究発表会講演論文集. 1. 153-154 (1991)

[Publications] 峯松信明: "複数の時間単位・精度の音響的特徴表現を用いた音声認識" 日本音響学会平成4年春季研究発表会講演論文集. 1. 31-32 (1992)

[Publications] 大野澄雄: "連続音声の語句の照合における種々のレベルの辞書情報の利用" 日本音響学会平成4年春季研究発表会講演論文集. 1. 95-96 (1992)