Project/Area Number |
16200016
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Advanced Telecommunications Research Institute International |
Principal Investigator |
KATO Hiroaki Advanced Telecommunications Research Institute International, ATR, Cognitive Information Science Laboratories, Senior Researche (20374093)
|
Co-Investigator(Kenkyū-buntansha) |
SAGISAKA Yoshinori Waseda University, GITI, Professor (70339737)
TSUZAKI Minoru Kyoto City University of Arts, Faculty of Music, Associate Professor (60155356)
YAMADA Reiko ATR, Supervisor (30395090)
TAJIMA Keiichi Housei University, Faculty of Letters, Associate Professor (70366821)
|
Project Period (FY) |
2004 – 2007
|
Project Status |
Completed (Fiscal Year 2007)
|
Budget Amount *help |
¥49,400,000 (Direct Cost: ¥38,000,000、Indirect Cost: ¥11,400,000)
Fiscal Year 2007: ¥11,570,000 (Direct Cost: ¥8,900,000、Indirect Cost: ¥2,670,000)
Fiscal Year 2006: ¥13,130,000 (Direct Cost: ¥10,100,000、Indirect Cost: ¥3,030,000)
Fiscal Year 2005: ¥14,300,000 (Direct Cost: ¥11,000,000、Indirect Cost: ¥3,300,000)
Fiscal Year 2004: ¥10,400,000 (Direct Cost: ¥8,000,000、Indirect Cost: ¥2,400,000)
|
Keywords | Perceptual and Cognitive Learning / Spoken Language Processing / Hearing and Audition / Temporal Aspect / Prosody / Acquisition of Spoken Language / International Information Exchange / Canada : Thailand : United States of America / カナダ:タイ |
Research Abstract |
The purpose of this project was to find clues by which humans retrieve the temporal structure of speech, to understand their usage, and to establish a quantitative method to evaluate the temporal adequateness or naturalness of a given speech sound that can replicate the performance of human judgment. For this purpose, three fundamental investigative tasks were implemented a study at the psychophysical level, a study at the linguistic level, and the construction of an evaluation model. One distinguishing feature of this project is that it emphasized the psychophysical aspects nearly as much as the linguistic, even though its primary object was spoken language. Since a person can easily recognize fast speech, even in the case of a foreign language where the meaning is unknown, it is assumed that the processing of the temporal aspects of speech undoubtedly involve non-linguistic and therefore language-independent activities. By concentrating on such processing that is independent of a give
… More
n language, we developed the basic technology toward a system with small overhead for language processing as well as simple extensibility to multiple languages. The major results follow. (I) Psychophysical level : An algorithm was developed to predict temporal reference points in a given speech by replicating the function of human auditory processing. This algorithm's most important benefit is its applicability to virtually unlimited language variations since it doesn't require any linguistic knowledge. (II) Linguistic level : An empirical study levealed that factors, which affect the perception of prosodic units, vary depending on the particular language's choice of units. This finding provides practical implications concerning how much weight should be placed on prosodic factors when designing effective foreign-language training methods. (III) Modeling : By integrating auditory functions derived from investigation at the psychophysical level, a mathematical model was implemented to automatically evaluate the naturalness of the speech of the English learners. The model's performance closely approximated the subjective evaluation of a native-speaking English instructor. This result not only suggests the importance of psychophysical factors in the adequateness or naturalness evaluation of speech but it also implies potential extensibility of the proposed model to multiple languages. Less
|