Speech Synthesis Method for Flexible Voice Quality Control

Research Project

Project/Area Number	08680386
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Utsunomiya University
Principal Investigator	KASUYA Hideki Utsunomiya University Faculty of Engineering, Professor, 工学部, 教授 (20006240)
Co-Investigator(Kenkyū-buntansha)	YANG Chang-Sheng Utsunomiya University Faculty of Engineering, Assistant, 工学部, 助手 (80272219)
Project Period (FY)	1996 – 1997
Project Status	Completed (Fiscal Year 1997)
Budget Amount *help	¥2,600,000 (Direct Cost: ¥2,600,000) Fiscal Year 1997: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 1996: ¥1,900,000 (Direct Cost: ¥1,900,000)
Keywords	Speech Synthesis / Voice Quality / Individuality / ARX Analysis / Formant / Voice Source Characteristics / Hoarse Voice / Whisper / フォルマント合成
Research Abstract	Flexible voice quality control in speech synthesis includes not only that of such qualities as whisper, breathy and tense but also that of talker individuality resulting from physiological differences in the speech organ. Major aim of this research project is to establish a base to realize such control in speech synthesis. In this year we have paid much attention to synthetic strategy to generate speech of whisper, breathy, harsh and tense quality as well as various talker individualities, using ARX (auto-regressive with exogenous input) speech analysis-synthesis method that was developed last year. As for whisper voice, we have investigated acoustic mechanism to interpret the formant structure specific to whisper voice and found new theory to explain frequency shift of lower formants based on MRI (magnetic resonance imaging) measurements of the larynx and computer simulation of acoustic resonance of the vocal tract. In order to produce breathy voice, we have proposed a method to control voicing source parameters and amount of laryngeal noise. Regarding harsh voice, we have first developed a sophisticated analysis-conversion-synthesis system that allows us to manipulate characteristics of jitter, shimmer, spectral fluctuation and laryngeal noise and then studied contributions of these parameters to the perception of harsh voice. From the experiments we have found that cross effects exist among these parameters to generate harsh voice quality. Tense voice has been successfully generated by controlling open quotient and spectral tilt of a voicing source waveform. Talker individuality has been found largely related to the static nature of formant trajectories and less to the dynamics.

Report

(3 results)

1997 Annual Research Report Final Research Report Summary
1996 Annual Research Report

Research Products
(17 results)

All Other

All Publications (17 results)

[Publications] 松田勝敬, 粕谷英樹: "ささやき声の音響理論" 日本音響学会平成9年度秋季研究発表会講演論文集. I. 299-300 (1997)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] C.S. Yang and H.Kasuya: "Automatic estimation of formant and voice source paramenters using a subspace based algorithm" Proceeding of IEEE International Confernce on Acoustics,Speech,and Signal Processing. (印刷中). (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] W.Zhu and H.Kasuya: "Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality" IEICE Trans.,Fundamentals of Electronics,Communications and Computer Sciences. E81-A,2. 268-274 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] W.Zhu and H.Kasuya: "A speech analysis-synthesis-editing system based on the ARX speech production model" J.Acosut.Soc.Jpn.(E). 19,3(印刷中). (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] T.Ohtuska, C.S. Yang and H.Kasuya: "Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection" Proceedings of International Congress on Acoustics. (印刷中). (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] 遠藤康男, 粕谷英樹: "周期ごとのゆらぎを考慮した音声の分析・変換・合成システム" 電子情報通信学会論文誌A. (掲載決定). (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] C.S.Yang and H.Kasuya: "Automatic estimation of formant and voice source parameters using a subspace based algorithm" Proceeding of IEEE International Conference on Acoustics, Speech, and Signal Preocessing. (in print.). (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] W.Zhu and H.Kasuya: "Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality" IEICE Trans., Fundamentals of Electronics, Communications and Computer Sciences. E81-A,No.2. 268-274 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] W.Zhu and H.Kasuya: "A speech analysis-synthesis-editing system based on the ARX production model" J.Acoust.Soc.Jpn.(E). Vol.19, No.3 (in print.). (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] T.Ohtsuka, C.S.yang and H.Kasuya: "Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection" Proceedings of Int.Congress Acoust.(in print.). (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] M.Matsuda and H.Kasuya: "An acoustic theory about whisper" Proceedings of The 1997 Autumn Meeting of the Acoustical Society of Japan. 299-300 (1997)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] C.S.Yang and H.Kasuya: "Automatic estimation of formant and voice source parameters using a subspace based algorithm" Proceeding of IEEE International Conference on Acoustics,Speech,and Signal Processing. (印刷中). (1998)
- Related Report
  1997 Annual Research Report
[Publications] W.Zhu and H.Kasuya: "Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality" IEICE Trans.,Fundamentals of Electronics,Communications and Computer Sciences. E81-A,2. 268-274 (1998)
- Related Report
  1997 Annual Research Report
[Publications] W.Zhu and H.Kasuya: "A speech analysis-synthesis-editing system based on the ARX speech production model" J.Acoust.Soc.Jpn.(E). 19,3(印刷中). (1998)
- Related Report
  1997 Annual Research Report
[Publications] T.Ohtsuka, C.S.Yang and H.Kasuya: "Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection" Proceedings of Int.Congress Acoust.(印刷中). (1998)
- Related Report
  1997 Annual Research Report
[Publications] 松田勝敬、粕谷英樹: "ささやき声の音響理論" 日本音響学会平成9年度秋季研究発表会講演論文集. 1. 299-300 (1997)
- Related Report
  1997 Annual Research Report
[Publications] W.Ding,et al.: "Fast and robust joint estimation of vocal tract and voice source parameters" Proc. ICASSP97. (1997)
- Related Report
  1996 Annual Research Report

Speech Synthesis Method for Flexible Voice Quality Control

Principal Investigator

KASUYA Hideki Utsunomiya University Faculty of Engineering, Professor, 工学部, 教授 (20006240)

¥2,600,000 (Direct Cost: ¥2,600,000)

Report

Research Products

[Publications] 松田 勝敬, 粕谷 英樹: "ささやき声の音響理論" 日本音響学会平成9年度秋季研究発表会講演論文集. I. 299-300 (1997)

Description

Related Report

[Publications] C.S. Yang and H.Kasuya: "Automatic estimation of formant and voice source paramenters using a subspace based algorithm" Proceeding of IEEE International Confernce on Acoustics,Speech,and Signal Processing. (印刷中). (1998)

Description

Related Report

[Publications] W.Zhu and H.Kasuya: "Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality" IEICE Trans.,Fundamentals of Electronics,Communications and Computer Sciences. E81-A,2. 268-274 (1998)

Description

Related Report

[Publications] W.Zhu and H.Kasuya: "A speech analysis-synthesis-editing system based on the ARX speech production model" J.Acosut.Soc.Jpn.(E). 19,3(印刷中). (1998)

Description

Related Report

[Publications] T.Ohtuska, C.S. Yang and H.Kasuya: "Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection" Proceedings of International Congress on Acoustics. (印刷中). (1998)

Description

Related Report

[Publications] 遠藤 康男, 粕谷 英樹: "周期ごとのゆらぎを考慮した音声の分析・変換・合成システム" 電子情報通信学会論文誌A. (掲載決定). (1998)

Description

Related Report

[Publications] C.S.Yang and H.Kasuya: "Automatic estimation of formant and voice source parameters using a subspace based algorithm" Proceeding of IEEE International Conference on Acoustics, Speech, and Signal Preocessing. (in print.). (1998)

Description

Related Report

[Publications] W.Zhu and H.Kasuya: "Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality" IEICE Trans., Fundamentals of Electronics, Communications and Computer Sciences. E81-A,No.2. 268-274 (1998)

Description

Related Report

[Publications] W.Zhu and H.Kasuya: "A speech analysis-synthesis-editing system based on the ARX production model" J.Acoust.Soc.Jpn.(E). Vol.19, No.3 (in print.). (1998)

Description

Related Report

[Publications] T.Ohtsuka, C.S.yang and H.Kasuya: "Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection" Proceedings of Int.Congress Acoust.(in print.). (1998)

Description

Related Report

[Publications] M.Matsuda and H.Kasuya: "An acoustic theory about whisper" Proceedings of The 1997 Autumn Meeting of the Acoustical Society of Japan. 299-300 (1997)

Description

Related Report

[Publications] C.S.Yang and H.Kasuya: "Automatic estimation of formant and voice source parameters using a subspace based algorithm" Proceeding of IEEE International Conference on Acoustics,Speech,and Signal Processing. (印刷中). (1998)

Related Report

[Publications] W.Zhu and H.Kasuya: "Perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality" IEICE Trans.,Fundamentals of Electronics,Communications and Computer Sciences. E81-A,2. 268-274 (1998)

Related Report

[Publications] W.Zhu and H.Kasuya: "A speech analysis-synthesis-editing system based on the ARX speech production model" J.Acoust.Soc.Jpn.(E). 19,3(印刷中). (1998)

Related Report

[Publications] T.Ohtsuka, C.S.Yang and H.Kasuya: "Automatic creation of CV templates for formant type speech synthesis based on HMM-based segmentation and syllable boundary detection" Proceedings of Int.Congress Acoust.(印刷中). (1998)

Related Report

[Publications] 松田勝敬、粕谷英樹: "ささやき声の音響理論" 日本音響学会平成9年度秋季研究発表会講演論文集. 1. 299-300 (1997)

Related Report

[Publications] W.Ding,et al.: "Fast and robust joint estimation of vocal tract and voice source parameters" Proc. ICASSP97. (1997)

Related Report

[Publications] 松田勝敬, 粕谷英樹: "ささやき声の音響理論" 日本音響学会平成9年度秋季研究発表会講演論文集. I. 299-300 (1997)

[Publications] 遠藤康男, 粕谷英樹: "周期ごとのゆらぎを考慮した音声の分析・変換・合成システム" 電子情報通信学会論文誌A. (掲載決定). (1998)