Budget Amount *help |
¥2,500,000 (Direct Cost: ¥2,500,000)
Fiscal Year 2000: ¥1,100,000 (Direct Cost: ¥1,100,000)
Fiscal Year 1999: ¥1,400,000 (Direct Cost: ¥1,400,000)
|
Research Abstract |
The papers collected in this report describe research performed under a grant entitled "Non-lexical Sounds : a New Interface Modality for Voice-based Information Delivery Systems" funded for fiscal 1999--2000. *er the past 5 years interactive voice response (IVR) systems have become ubiquitous in the United States and are making inroads in Japan. Indeed, it is becoming impossible to get train schedules, apartment information, call routing, flight information, weather information, and so on from a real person. One reason these systems are universally hated is the need to listen to menus, navigate through them, and push buttons to select content, but this difficulty is being resolved thanks to the deployment of speech recognition technology. The second problem with these systems is that the information provided is given in fixed chunks, lasting from a few seconds to a few tens of seconds, and the user is essentially forced to listen as the system plays back a chunk. In contrast, people pro
… More
viding information over the telephone are much more flexible. One aspect of this is that they are responsive to feedback from the listener. In particular, in many dialog types the listener frequently produces back-channels, such as uh-huh, uh, yeah-yeah, oh, ummmm and so on, and the information provider adapts his presentation in response. Thus the aim of this project was to discover the meanings and functions of such non-lexical conversational sounds, and to exploit them in voice-based information delivery systems. The first component of the project was basic research into the meanings and functions of non-lexical sounds in conversation in two languages, Japanese and English. This led to a new understanding of these sounds. For English, this was formalized as a model in which these items are explained, not as fixed words, but as dynamic creations, generated by a simple model consisting of 10 component sounds and 2 combining rules. The semantic component of the model is sound-symbolic : each of these component sounds bears some meaning or function * is fairly constant across grunts and across contexts, and the meaning of a conversational grunt is largely the sum of the meanings of its phonetic components. The second component of the project was an use of these sounds in a tutorial system. This system produced non-lexical sounds, suitably varied according to context, demonstrating their utility in goal-oriented dialogs. The third component of the project was a study of the use of non-lexical sounds in a real-time control application, in which the computer responded in real-time to non-lexical advice from the user, such as mm--mm--mm or ack! . Topics which remain on the agenda are : 1. a model of the meanings of the prosodic features of non-lexical sounds, 2. a model of non-lexical sounds in conversational Japanese, 3. a system which responds to these sounds during number-giving in a directory-assistance type IVR system, and 4. a spoken dialog salestalk-type system which adapts its presentation based on these sounds. Less
|