Co-Investigator(Kenkyū-buntansha) |
MASUKO Takashi Tokyo Institute of Technology, Dept. of Information Processing, Research Associate, 大学院・総合理工学研究科, 助手 (90272715)
TOKUDA Keiichi Nagoya Institute of Technology, Dept. of Computer Science, Associate Professor, 工学部, 助教授 (20217483)
|
Research Abstract |
The main purpose of this research is to realize a text-to-speech synthesis system which can generate speech with various voice characteristics based on hidden Markov models (HMMs). We have obtained the following results. 1. Modeling of phonetic and prosodic information of speech based on HMM We have proposed a new kind of HMM, called multi-space probability distribution HMM (MSD-HMM), which can model pitch pattern of speech without heuristic assumption. Then we have also proposed a technique in which spectrum, pitch, and state duration are modeled simultaneously in a unified framework of HMM. 2. Speech parameter generation from HMM We have extended the parameter generation algorithm from HMM to a general case in which the state sequence or a part of it is latent and derived a new algorithm. We have also derived a pitch pattern generation algorithm based on MSD-HMM 3. Realization of text-to-speech synthesis system based on HMMs We have developed a Japanese text-to-speech synthesis system, which works on workstations and PCs, based on the simultaneous modeling of spectrum, pitch, and duration by HMM and the speech parameter generation from HMM. 4. Speech synthesis with various voice characteristics We have proposed voice characteristics conversion techniques for the HMM-based speech synthesis system using speaker adaptation techniques for HMMs, such as MAP/VFS and MLLR.We have also proposed a speaker interpolation technique by interpolating HMM parameters among representative speakers' HMM sets. Using these techniques, we have shown that the HMM-based speech synthesis system can generate speech with various voice characteristics.
|