Development of voice montage system.
Project/Area Number |
16300061
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Tohoku Institute of Technology |
Principal Investigator |
KIDO Hiroshi Tohoku Institute of Technology, Faculty of Engineering, Associate Professor, 工学部, 助教授 (00356172)
|
Co-Investigator(Kenkyū-buntansha) |
KASUYA Hideki International University of Health and Welfare, The School of Health Science, Professor, 保健学部, 教授 (20006240)
SHIGENO Sumi Aoyama Gakuin University, College of Literature, Professor, 文学部, 教授 (20162589)
|
Project Period (FY) |
2004 – 2006
|
Project Status |
Completed (Fiscal Year 2006)
|
Budget Amount *help |
¥14,300,000 (Direct Cost: ¥14,300,000)
Fiscal Year 2006: ¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2005: ¥5,400,000 (Direct Cost: ¥5,400,000)
Fiscal Year 2004: ¥5,300,000 (Direct Cost: ¥5,300,000)
|
Keywords | Voice quality / Utterance style / Speech synthesis / Everyday expressions / Phonetic memory / Auditory impression / Extra-linguistic information / Para-linguistic information / 音質 / バラ言語情報 |
Research Abstract |
The final aim of this theme is the realization of a voice montage system, which attempts to reproduce with the speech synthesis technique an utterance of a person that a listener remembers. With the support of this grant, we have explored six fundamental problems needed to achieve the aim. The outcomes of our endeavors toward attaining the goal are described below: (1).Everyday expressions associated with voice quality and speaking style : The elaboration on the extraction from various lexical sources resulted in 1,102 expressions. (2).Acoustic correlates of the expressions : Acoustic correlates of vocal age that have remained unsolved in speech synthesis have extensively been investigated and the speaking rate, pitch contour and irregularities in the speech waveform were found to be of vital importance. (3). Memory of the utterance Cross-linguistic perceptual experiments showed that the memory of the voice quality of an utterance was almost independent of the language of the utterance. (4).Presentation strategy of voice samples : Audio-visual interaction was found to be significant, suggesting an importance of the combination of the voice and visual display in realizing the voice montage. (5).Similarity assessment of synthetic speech : Similarity of voice quality of the utterances was found to be preserved for a long period of time but speaker individuality of the utterances was not necessarily maintained in the memory. (6).Flexible speech synthesis technology : The ARX analysis-synthesis method was further improved, in particular for GUI, allowing the user to have more flexible and easy manipulation in speech synthesis. The outcomes mentioned above were integrated into an experimental system of voice montage, which primarily focuses on the reproduction of the voice quality of the utterance. Further elaboration on the problem (1) above, especially on exploring the expressions related to speaking style is necessary to attain the final goal of the project.
|
Report
(4 results)
Research Products
(69 results)