Exploiting Speech Understanding in Intelligent Interfaces

Research Project

Project/Area Number	06044055
Research Category	Grant-in-Aid for international Scientific Research
Allocation Type	Single-year Grants
Section	Joint Research
Research Institution	The University of TOKYO
Principal Investigator	WARD Nigel The University of TOKYO Department of Mechano-Informatics Associate Professor, 大学院・工学系研究科, 助教授 (00242008) WARDNIGEL.G (1995) 東京大学, 工学部, 助教授
Co-Investigator(Kenkyū-buntansha)	TAJCHMAN Gar 国際コンピュータサイエンス研究所, 音声課, 研究員 MORGAN Nelso 国際コンピュータサイエンス研究所, 音声課・カリフォーニア大学・工学部・研究員, 教授 JURAFSKY Dan 国際コンピュータサイエンス研究所, 音声課・カリフォーニア大学・工学部・研究員, 助教授 TERADA Minoru The University of TOKYO Department of Mechano-Informatics, 大学院・工学系研究科, 助教授 (80163921) INOUE Hirochika The University of TOKYO Department of Mechano-Informatics, 大学院・工学系研究科, 教授 (50111464) DAN Jurafsky International Computer Science Institute NELSON Morgan International Computer Science Institute GARY Tajchman International Computer Science Institute
Project Period (FY)	1994 – 1995
Project Status	Completed (Fiscal Year 1995)
Budget Amount *help	¥4,200,000 (Direct Cost: ¥4,200,000) Fiscal Year 1995: ¥1,400,000 (Direct Cost: ¥1,400,000) Fiscal Year 1994: ¥2,800,000 (Direct Cost: ¥2,800,000)
Keywords	UserInterFace / Speech Understanding / Speech Input / Natural Language / Understanding / AIZUCHI / MultiModel / あいずち / 音声 / ユーザー・インタフェース / ノイズ / 英語 / 日本語 / 文法
Research Abstract	We are interested in the use of spoken language in human-computer interaction. The inspiration is the fact that, for human-human interaction, meaningful exchanges can take place even without accurate recognition of the words the other is saying --- this being possible due to shared knowledge and complementary communication channels, especially gesture and prosody. We want to exploit this fact for man-machine interfaces. Therefore we are doing three things : 1. Using simple speech recognition to augment graphical user interfaces, well integrated with other input modalities : keyboard, mouse, and touch screen. 2. Building systems able to engage in simple conversations, using mostly prosodic clues. To sketch out our latest success : We conjectured that it would be possible for Japanese to decide when to produce many back-channel utterances based on prosodic clues alone, without reference to meaning. We found that neither vowel lengthening, volume changes, nor energy level (to detect when the other finished speaking) were by themselves good predictors of when to produce an aizuchi. The best predictor was a low pitch level. Specifically, upon detection of the end of a region of pitch less than.9 times the local median pitch and continuing for 150ms, coming after at least 600ms of speech, the system predicted an aizuchi 200ms to 300ms later, providing it had not done so within the preceding 1 second. We also built a real-time system based on the above decision rule. A human stooge steered the conversation to a suitable topic and then switched on the system. After swich-on the stooge's utterances and the system's outputs, mixed together, produced one side of the conversation. We found that none of the 5 subjects had realized that his conversation partner had become partially automated. 3. Building tools and collecting data to help do 1 and 2.

Report

(2 results)

1995 Final Research Report Summary
1994 Annual Research Report

Research Products
(19 results)

All Other

All Publications (19 results)

[Publications] Nigel,WARD: "Using Prosodic Clucs to Decide When to Produce Back-Channel Utterances" CSLP.
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Jurafsky,Daniel: "A Probabilistic Model of Lexical and Syntactic Access and Disambiguation" Cognitive Science.
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Jurafsky,Daniel: "Universal Tendencies in the Semantics of the Diminutives : Stuructured Polysemy and the Semantic Shift from Children to Second-Order Predicates" Language.
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Gildea,Daniel and Daniel. Jurafsky: "Learning Bias and Phonological Rules Induction" Computational Linguistics.
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Gildea,Daniel and Daniel. Jurafsky: "Automatic Induction of Finite State Transducers for Simple Phonological Rules" In Proceedings of ACL95. 9-15 (1995)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Tajchman,Gary and Dan,Jurafsky and Eric Folder: "Learning Phonological Rule Probabilities from Speech Corpora with Exploratory Computational Phonology" In Proceedings of ACL95. 9-15 (1995)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Tachjman,Gary and Eric,Fosler and Dan,Jurafsky: "Building Multiple Pronunciation Models for Novel Words using Exploratory Computational Phonology" In Proceeding of EUROSPEECH-95.
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Jurafsky,Daniel Chuck,Wooters Gary,Tajchman Jonathan,Segal Andress.Stolcke Eric,Fosler and Nelson,Morgan: "Using a Stochastic Context-Free Grammer as a Language Model for Speech Recognition" In Proceedings of ICASSP-95. 189-192 (1995)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Nigel, WARD: "Using Prosodic Clucs to Decide When to Produce Back-Channel Utterances" CSLP.
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Jurafsky, Daniel: "A Probabilistic Model of Lexical and Syntactic Access and Disambiguation" Cognitive Science.
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Jurafsky, Daniel: "Universal Tendencies in the Semantics of the Diminutives : Structured Polysemy and the Semantic Shift from Children to Second-Order Predicates" Language.
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Gildea, Daniel and Daniel.Jurafsky: "Learning Bias and Phonological Rules Induction" Computational Linguistics. (1995)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Gildea, Daniel and Daniel.Jurafsky: "Automatic Induction of Finite State Transducers for Simple Phonological Rules" In Proceedings of ACL95. 9-15 (1995)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Tajchman, Gary and Dan, Jurafsky and Eric Folder: "Learning Phonological Rule Probabilities from Speech Corpora with Exploratory Computational Phonology" In Proceedings of ACL95. 9-15 (1995)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Tachjman, Gary and Eric Fosler and Dan, Jurafsky: "Building Multiple Pronunciation Models for Novel Words using Exploratory Computational Phonology" In Proceedings of EUROSPEECH-95.
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Jurafsky, Daniel Chuck, Wooters Gary, Tajchman Jonathan, Segal Andress.Stolcke Eric, Fosler and Nelson, Morgan: "Using a Stochastic Context-Free Gramr as a Language Model for Speech Recognition" In Proceedings of ICASSP-95. 189-192 (1995)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1995 Final Research Report Summary
[Publications] Nigel Ward: "An Approach to Tightly-Coupled Syntactic/Semantic Processing for Speech Understanding" Proceedings of the AAAT Workshop on the Integration of Natural Language and Speech Processing. 50-57 (1994)
- Related Report
  1994 Annual Research Report
[Publications] Jurafsky,Daniel: "Using a stochastic contex-free grammar as a language model for speech recognition" IEEE ICASSP-95. (1995)
- Related Report
  1994 Annual Research Report
[Publications] Morgan,Nelson: "Modeling Dynamics in Connectionist Speech Recog-nition-The Time Index Model" International Conference on Spoken Language Processing. 3. 1523-1526 (1994)
- Related Report
  1994 Annual Research Report

Exploiting Speech Understanding in Intelligent Interfaces

Principal Investigator

WARD Nigel The University of TOKYO Department of Mechano-Informatics Associate Professor, 大学院・工学系研究科, 助教授 (00242008)

WARDNIGEL.G (1995) 東京大学, 工学部, 助教授

¥4,200,000 (Direct Cost: ¥4,200,000)

Report

Research Products

[Publications] Nigel,WARD: "Using Prosodic Clucs to Decide When to Produce Back-Channel Utterances" CSLP.

Description

Related Report

[Publications] Jurafsky,Daniel: "A Probabilistic Model of Lexical and Syntactic Access and Disambiguation" Cognitive Science.

Description

Related Report

[Publications] Jurafsky,Daniel: "Universal Tendencies in the Semantics of the Diminutives : Stuructured Polysemy and the Semantic Shift from Children to Second-Order Predicates" Language.

Description

Related Report

[Publications] Gildea,Daniel and Daniel. Jurafsky: "Learning Bias and Phonological Rules Induction" Computational Linguistics.

Description

Related Report

[Publications] Gildea,Daniel and Daniel. Jurafsky: "Automatic Induction of Finite State Transducers for Simple Phonological Rules" In Proceedings of ACL95. 9-15 (1995)

Description

Related Report

[Publications] Tajchman,Gary and Dan,Jurafsky and Eric Folder: "Learning Phonological Rule Probabilities from Speech Corpora with Exploratory Computational Phonology" In Proceedings of ACL95. 9-15 (1995)

Description

Related Report

[Publications] Tachjman,Gary and Eric,Fosler and Dan,Jurafsky: "Building Multiple Pronunciation Models for Novel Words using Exploratory Computational Phonology" In Proceeding of EUROSPEECH-95.

Description

Related Report

[Publications] Jurafsky,Daniel Chuck,Wooters Gary,Tajchman Jonathan,Segal Andress.Stolcke Eric,Fosler and Nelson,Morgan: "Using a Stochastic Context-Free Grammer as a Language Model for Speech Recognition" In Proceedings of ICASSP-95. 189-192 (1995)

Description

Related Report

[Publications] Nigel, WARD: "Using Prosodic Clucs to Decide When to Produce Back-Channel Utterances" CSLP.

Description

Related Report

[Publications] Jurafsky, Daniel: "A Probabilistic Model of Lexical and Syntactic Access and Disambiguation" Cognitive Science.

Description

Related Report

[Publications] Jurafsky, Daniel: "Universal Tendencies in the Semantics of the Diminutives : Structured Polysemy and the Semantic Shift from Children to Second-Order Predicates" Language.

Description

Related Report

[Publications] Gildea, Daniel and Daniel.Jurafsky: "Learning Bias and Phonological Rules Induction" Computational Linguistics. (1995)

Description

Related Report

[Publications] Gildea, Daniel and Daniel.Jurafsky: "Automatic Induction of Finite State Transducers for Simple Phonological Rules" In Proceedings of ACL95. 9-15 (1995)

Description

Related Report

[Publications] Tajchman, Gary and Dan, Jurafsky and Eric Folder: "Learning Phonological Rule Probabilities from Speech Corpora with Exploratory Computational Phonology" In Proceedings of ACL95. 9-15 (1995)

Description

Related Report

[Publications] Tachjman, Gary and Eric Fosler and Dan, Jurafsky: "Building Multiple Pronunciation Models for Novel Words using Exploratory Computational Phonology" In Proceedings of EUROSPEECH-95.

Description

Related Report

[Publications] Jurafsky, Daniel Chuck, Wooters Gary, Tajchman Jonathan, Segal Andress.Stolcke Eric, Fosler and Nelson, Morgan: "Using a Stochastic Context-Free Gramr as a Language Model for Speech Recognition" In Proceedings of ICASSP-95. 189-192 (1995)

Description

Related Report

[Publications] Nigel Ward: "An Approach to Tightly-Coupled Syntactic/Semantic Processing for Speech Understanding" Proceedings of the AAAT Workshop on the Integration of Natural Language and Speech Processing. 50-57 (1994)

Related Report

[Publications] Jurafsky,Daniel: "Using a stochastic contex-free grammar as a language model for speech recognition" IEEE ICASSP-95. (1995)

Related Report

[Publications] Morgan,Nelson: "Modeling Dynamics in Connectionist Speech Recog-nition-The Time Index Model" International Conference on Spoken Language Processing. 3. 1523-1526 (1994)

Related Report