Co-Investigator(Kenkyū-buntansha) |
HIROSE Keikichi The University of Tokyo, Graduate School of Information Science and Technology, Professor (50111472)
HARADA Yasunari Waseda University, School of Law, Professor (80189711)
YAMAUCHI Yutaka Tokyo International University, School of Business and Commerce, Associate Professor (30306245)
KOCHIYAMA Akiko Chubu University, Department of Humanity and Social Sciences, Associate Professor (80350990)
MAKINO Takehiko Chuo University, Faculty of Economics, Associate Professor (00269482)
|
Research Abstract |
The Ministry of Education, Culture, Sports, Science and Technology in Japan announced that English education will be introduced into primary schools in 2011. This means that the number of Japanese students of English will be drastically increased in 2011 but it is a fact that the number of English teachers is not sufficient at all for those students. In this study, to solve this problem a new technique was built for supporting young students' learning English and assessing their pronunciation. Children's voices are very difficult to process adequately with the current speech technology. For example, if an assessment system is built with adult speech samples, the system cannot deal with children's voices adequately due to a large acoustic difference between adults' voices and children's voices. If a large number of samples of children voices are available, a good system is possible but the recording with young children is a very heavy task Speaker adaptation technology can be used to ad
… More
apt a system for adults into a system for a specific young child (student). But in this case, bad pronunciation may be judged as good because of over-adaptation to the specific young student. To solve these problems completely, we proposed a new speech technique of representing an utterance through removing the acoustic features showing the age and gender of the speaker. In the proposed method, only the timbre contrasts were extracted from speech events, where a contrast was measured as Bhattacharyya distance because the distance is completely invariant with any kind of linear or non-linear transformation. Speaker differences can be described as acoustic transformation of voices and then, the proposed representation is speaker-invariant. Using this new structural representation, a system of assessing English vowels produced by students of any age was built. The system has four functions. 1) recording or logging vowel system changes of individual students, caused by training. 2) classification of learners purely based on pronunciation variation, irrespective of age and gender, 3) generation of instructions on which vowels to correct at first, and 4) very motivating user-interface for pronunciation training. During a three-year period, a proto-type system was tested and evaluated in many locations, such as high schools, junior high schools and primary schools. Then, over 500 students aging from 3 to 70 joined our pronunciation test. The analysis results showed the very high validity of the proposed method and system. Further, using the data, we classified over-500 students based on their pronunciations, irrespective of age and gender, and we defined 5 typical Japanese pronunciations of English. Less
|