Research on Quality Improvement of Synthesized Speech
Project/Area Number |
15500118
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Aichi Prefectural University |
Principal Investigator |
KANAMORI Yasukazu Aichi Prefectural University, Information of science and technology, Associate professor, 情報科学部, 助教授 (50230868)
|
Project Period (FY) |
2003 – 2004
|
Project Status |
Completed (Fiscal Year 2004)
|
Budget Amount *help |
¥3,000,000 (Direct Cost: ¥3,000,000)
Fiscal Year 2004: ¥1,000,000 (Direct Cost: ¥1,000,000)
Fiscal Year 2003: ¥2,000,000 (Direct Cost: ¥2,000,000)
|
Keywords | Speech synthesis / Phoneme chain / Connection distortion / Rhythm control / Improvement of sound quality / Naturalness / Transition area / Segmentation / 品質改善 / 中国語音声合成 / 中国語合成 |
Research Abstract |
In human communication, it is for it to be the most imminent to use speech. On the other hand, the opportunities when more people can touch Chinese by Chinese opening increase. Avoice to demand a high quality speech synthesis system of Chinese is loud. However, now such a system is in a study stage, this study examined the needs. This study examines miniaturization of a database, improvement of sound quality, elucidation from an aspect such as phonology of a phoneme chain. This study examined the following things. At first it is necessary to build phoneme chain data as examination the influence of a phoneme chain. Because there was a purpose to reduce a connection distortion in speech synthesis, there were a few samples when connection distorted is a few, as explosives that I shared by a conventional study, and prepared for many patterns about voiced sound (a vowel / a nasal vowel / a semivowel interval). At first the recording was done about standard Chinese male speaker alone. The phone
… More
me segmentation was done to these data. About each phoneme chain, I supposed to use a pattern with 3 points, the two points are the point of starting and ending point of the transition area, another point is the center in transition area which is used as conventional study for vowel transition. For example, it was able to put up naturalness of formant synthesis than using an approximation of a straight line when a multinomial expression curve was used to approximate the formant and pitch within transition area. By a conventional study, there were many studies to use a sound source model for formant synthesis, but it is tried to use an as possible original sound source by this study and contribute to improvement of sound quality. It is going to examine comparison with the sound obtained by formant synthesis, and sound with STRAIGHT which is non-parametric synthesizer in future. In addition, it is said to be difficult to synthesize a woman speech with formant synthesizer, we will check it with these method as increasing some speakers to expand the number of the data. To improve the quality of synthesized speech more, it is also need to improve the algorithm to control rhythm patterns by a long unit such as a phrase or a sentence. Less
|
Report
(3 results)
Research Products
(2 results)