• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

High-quality Speech Synthesis based on Accurate Analysis Method and Statistical Method

Research Project

Project/Area Number 12480079
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

HIROSE Keikichi  Graduate School of Frontier Sciences, Professor, 大学院・新領域創成科学研究科, 教授 (50111472)

Co-Investigator(Kenkyū-buntansha) MINEMATSU Nobuaki  Graduate School of Information Science and Technology, Associate Professor, 大学院・情報理工学系研究科, 助教授 (90273333)
Project Period (FY) 2000 – 2002
Project Status Completed (Fiscal Year 2002)
Budget Amount *help
¥10,000,000 (Direct Cost: ¥10,000,000)
Fiscal Year 2002: ¥2,800,000 (Direct Cost: ¥2,800,000)
Fiscal Year 2001: ¥4,400,000 (Direct Cost: ¥4,400,000)
Fiscal Year 2000: ¥2,800,000 (Direct Cost: ¥2,800,000)
KeywordsStatistical Speech Synthesis / Terminal Analogue Synthesis / Waveform Concatanative Synthesis / HMM Speech Syntheses / AR-HMM Model / Fundamental Frequency Contour / Generation Process Model / Emotional Speech Synthesis / 波形編集合成 / 声帯音源波形モデル / フォルマント推定 / 統計的音声合成手法 / 分節的特徴 / 韻律的特徴 / 対話音声 / ARX分析 / モーラ持続時間
Research Abstract

The original research plan, which aims at realizing high-quality speech synthesis through utilizing accurate pole-zero information of vocal transfer function for segmental feature generation and applying the functional model constraints for prosodic feature generation, was accomplished with the following results :
1. A successive approximation was applied to ARX analysis enabling accurate pole-zero estimation. The method was combined with our formerly developed terminal analogue synthesizer to construct a analysis-synthesis workbench. Using this, we succeeded to improve the quality of liquid sound.
2. A speech synthesizer, hybrid of terminal analogue and waveform concatenation, was developed. A high-quality speech synthesis was realized.
3. A method was developed for stable formant extraction, which was based on AR-HMM modeling, representing source waveform using HMM. Result of speech synthesis experiment showed that the method could generate high-quality even for a large F0 (fundamental … More frequency) change.
4. By adding natural waveform of junction periods in the spectral domain with appropriate weighting to the concatenated speech, we successfully realized a smooth spectral transition. Also we developed a method to effectively reduce the corpus size for concatenative synthesis by the weighted VQ according to the frequency.
5. The necessary data size for speaker adaptation was investigated form the viewpoint of speech quality after developing a HMM speech synthesizer. It was shown that a good quality was obtainable 10 and more sentences.
6. F0 contour generation was realized by estimating the generation process model parameters using statistical methods. A high speech quality was realized only from a small speech corpus by using linguistic information such as on direct modification relations of words. Also we succeeded to estimate the accent phrase boundaries form text using the same statistical framework. Furthermore, F0 contour generation and phoneme length estimation were realized for emotional speech with a good result.
7. A method for automatically estimating F0 contour generation process model commands was realized. Using the method, a prosodic corpus was made. This corpus is indispensable for the above F0 contour generation.
8. A rule for controlling mora duration for dialogue-like speech synthesis was constructed. The result of the speech synthesis experiment showed the validity of the rule. Less

Report

(4 results)
  • 2002 Annual Research Report   Final Research Report Summary
  • 2001 Annual Research Report
  • 2000 Annual Research Report
  • Research Products

    (71 results)

All Other

All Publications (71 results)

  • [Publications] Keikichi Hirose: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. 36・1-2. 97-111 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 広瀬啓吉: "音声合成研究への招待 -自由な合成の実現に向けて-"情報処理. 43・3. 321-324 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 成澤修一: "音声の基本周波数パターン生成過程モデルのパラメータ自動抽出法"情報処理学会論文誌. 43・7. 2155-2168 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Atsuhiro Sakurai: "Data-driven generation of FO contours using a superpositional model"Speech Communication. (発表予定). (2003)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Nobuyuki Nishizawa: "Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese"Proc. International Conference on Spoken Language Processing. 1. 725-728 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Atsuhiro Sakurai: "Modeling and generation of accentual phrase FO contours based on discrete HMMs synchronized at mora-unit transitions"Proc. International Conference on Spoken Language Processing. 3. 259-262 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc. Speech Prosody 2002. 391-394 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose: "Improved corpus-based synthesis of fundamental frequency contours using generation process model"Proc. International Conference on Spoken Language Processing. 3. 2085-2088 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Nobuyuki Nishizawa: "Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model"Proc. International Conference on Spoken Language Processing. 3. 1721-1724 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose: "Corpus-based synthesis of FO contours for emotional speech using the generation process model"Proceedings 15th International Congress of Phonetic Sciences. (発表予定). (2003)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 西澤信行: "波形編集を併用したフォルマント音声合成"電子情報通信学会技術研究報告(音声研究会). 35-42 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Nettre Benjamin: "An experimental study on concatenative speech synthesis using a fusion technique and VCV/VV units"電子情報通信学会技術研究報告(音声研究会). 53-60 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 広瀬啓吉: "音声情報処理におけるパラ・非言語情報"日本音響学会秋季講演論文集. 1. 243-246 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Wentao Gu: "Considerations on acoustic models for HMM-based Mandarin synthesis"日本音響学会春季講演論文集. (発表予定). (2003)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose: "Yesterday and for the Spoken Language Researches (Corpus-based synthesis of fundamental frequency contours for TTS systems based on a generation Process model)"TaeHakSa. 680(17) (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose and Hiromichi Kawanami: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. 36, 1-2. 97-111 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose: "Invitation to Speech Synthesis Research -Toward Realization of Flexible Synthesis-"Information Processing Society of Japan Magazine. 43, 3. 321-324 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Shuichi Narusawa, Nobuaki Minematsu, Keikichi Hirose and Hiroya Fujisaki: "A method for automatic extraction of parameters of the fundamental frequency contour generation model"IPSJ (Information Processing Society of Japan) Journal. 43, 7. 2155-2168 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Atsuhiro Sakurai, Keikichi Hirose and Nobuaki Minematsu: "Data-driven generation of F0 contours using a superpositional model"Speech Communication. to be published. (2003)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Nobuyuki Nishizawa, Nobuaki Minematsu and Keikichi Hirose: "Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese"Proc. International Conference on Spoken Language Processing. 1. 725-728 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Atsuhiro Sakurai, Koji Iwano and Keikichi Hirose: "Modeling and generation of accentual phrase F0 contours based on discrete HMMs synchronized at mora-unit transitions"Proc. International Conference on Spoken Language Processing. 3. 259-262 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose, Nobuaki Minematsu, and Masaya Eto: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc. Speech Prosody. 2002. 391-394 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose, Masaya Eto, and Nobuaki Minematsu: "Improved corpus-based synthesis of fundamental frequency contours using generation process model"Proc. International Conference on Spoken Language Processing. 3. 2085-2088 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Nobuyuki Nishizawa, Keikichi Hirose, and Nobuaki Minematsu: "Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model"Proc. International Conference on Spoken Language Processing. 3. 1721-1724 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose, Toshiya Katsura, and Nobuaki Minematsu: "Corpus-based synthesis of F0 contours for emotional speech using the generation process model"Proc. 15th International Congress of Phonetic Sciences. to be published. (2003)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Nobuyuki Nishizawa, Nobuaki Minematsu, and Keikichi Hirose: "Formant speech synthesis partly using waveform concatenative synthesis -Experimental study on VCV sounds-"IEICE Technical Report. SP2001-20. 35-42 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Nettre Benjamin, Keikichi Hirose, and Nobuaki Minematsu: "An experimental study on concatenative speech synthesis using a fusion technique and VCV/VV units"IEICE Technical Report. SP2001-121. 53-60 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose: "Para-and non-linguistic information in speech information processing"Record of Fall Meeting, Acoust. Soc. Japan. 1. 243-246 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Wentao Gu, Keikichi Hirose, Nobuaki Minematsu: "Considerations on acoustic models for HMM-based Mandarin synthesis"Record of Spring Meeting, Acoust. Soc. Japan. 1, to be published. (2003)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Keikichi Hirose, Nobuaki Minematsu and Masaya Eto: "Corpus-based synthesis of fundamental frequency contours for TTS systems based on a generation Process model"Yesterday and Today for the Spoken Language Researches, TaeHakSa. 461-477 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Sanghun Kim: "A new Korean corpus-based text-to-speech system"International Journal of Speech Technology. 5・2. 105-116 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 成澤修一: "音声の基本周波数パターン生成過程モデルのパラメータ自動抽出法"情報処理学会論文誌. 43・7. 2155-2168 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Nobuaki Minematsu: "Automatic estimation of accentual attribute values of words for accent sandhi rules of Japanese text-to-speech conversion"IEICE Trans. Information and Systems, Vol.,No.1,pp.550-557. E86-D・1. 550-557 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] Atsuhiro Sakurai: "Data-driven generation of FO contours using a superpositional model"Speech Communication. (発表予定). (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] Keikichi Hirose: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc. Speech Prosody 2002. 391-394 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Shuichi Narusawa: "A method for automatic extraction of model parameters from fundamental frequency contours of speech"Proc. IEEE International Conference on Acoustics, Speech, & Signal. 1. 509-512 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Keikichi Hirose: "Improved corpus-based synthesis of fundamental frequency contours using generation process model"Proc. International Conference on Spoken Language Processing. 3. 2085-2088 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Nobuyuki Nishizawa: "Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model"Proc. International Conference on Spoken Language Processing. 3. 1721-1724 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Keikichi Hirose: "Corpus-based synthesis of FO contours for emotional speech using the generation process model"Proceedings 15th International Congress of Phonetic Sciences. (発表予定). (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 西澤信行: "音声合成のためのAR-HMMモデルに基づく音声分析手法の検討"電子情報通信学会技術研究報告(音声研究会). 35-40 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 桂 聡哉: "感情音声合成のための生成過程モデルに基づくコーパスベース韻律生成とその評価"電子情報通信学会技術研究報告(音声研究会). (発表予定). (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 広瀬啓吉: "音声情報処理におけるパラ・非言語情報"日本音響学会秋季講演論文集. I. 243-246 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Wentao Gu: "Considerations on acoustic models for HMM-based Mandarin synthesis"日本音響学会春季講演論文集. (発表予定). (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 西澤信行: "音声合成のためのAR-HMMモデリングに基づく音声自動分析"日本音響学会春季講演論文集. (発表予定). (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 広瀬啓吉: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. 36・1-2. 97-111 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 桜井淳宏: "Generation of F0 contours using model-constrained data-driven method"Proc.IEEE International Conference on Acoustics, Speech, & Signal Processing. 2. 817-820 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 広瀬啓吉: "Corpus-based synthesis of fundamental frequency contours based on a generation process model"Proc.European Conference on Speech Communication and Technology. 3. 2255-2258 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 峯松信明: "Quantitative analysis of F0-induced variations of cepstrum coefficients"Proceedings ISCA Tutorial and Research Workshop on : Prosody in Speech Recognition and Understanding. 113-117 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 広瀬啓吉: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc.Speech Prosody 2002. (発売予定). (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 成澤修一: "A method for automatic extraction of model parameters from fundamental frequency contours of speech"Proc.IEEE International Conference on Acoustics, Speech, & Signal Processing. (発売予定). (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 江藤雅哉: "生成過程モデルと統計的手法による基本周波数パターンの生成"電子情報通信学会技術研究報告(音声研究会). 1-8 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 西澤信行: "波形編集を併用したフォルマント音声合成"電子情報通信学会技術研究報告(音声研究会). 35-42 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 江藤雅哉: "生成過程モデルと統計的手法による統語構造を考慮した基本周波数パターンの生成"電子情報通信学会技術研究報告(音声研究会). 17-22 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] Nettre Benjamin: "An experimental study on concatenative speech synthesis using a fusion technique and VCV/VV units"電子情報通信学会技術研究報告(音声研究会). 53-60 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 西澤信行: "自然音声波形を併用したハイブリッド型フォルトマン音声合成システムにおける子音波形テンプレート削減の検討"日本音響学会講演論文集. I. 237-238 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 江藤雅哉: "テキスト音声合成システムのための統計モデルによるF0パターン生成の改良"日本音響学会講演論文集. I. 245-246 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 喜多竜二: "テキスト音声合成のための日本語アクセント結合規則の構築"日本音響学会講演論文集. I. 247-248 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 西澤信行: "HMMによる音源モデルを用いたフォルトマント合成パラメータ推定"日本音響学会講演論文集. I. 357-358 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 広瀬啓吉: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. (発表予定). (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] 峯松信明: "PSOLA分析合成に基づくFO変換音声の品質向上に関する実験的検討"電子情報通信学会論文誌. J83-D-II・7. 1590-1599 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 広瀬啓吉: "Analytical and perceptual study on the role of acoustic features in realizing emotional speech"Proc.International Conf.on Spoken Language Processing. 2. 369-372 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 西沢信行: "Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese"Proc.International Conf.on Spoken Language Processing. 1. 725-728 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 桜井淳宏: "Data-driven intonationmodeling using a neural network and a command response model"Proc.International Conf.on Spoken Language Processing. 3. 223-226 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 桜井淳宏: "Modeling and generation of accentual phrase F0 contours based on discrete HMMs synchronized at mora-unit transitions"Proc.International Conf.on Spoken Language Processing. 3. 259-262 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 桜井淳宏: "Generation of F0 contours using model-constrained data-driven method"Proceeding IEEE International Conference on Acoustics, Speech, & Signal Processing, Proc.IEEE International Conf.on Acoustics,Speech, & Signal Processing. (発表予定). (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] 西沢信行: "フォルマント分析合成システムの開発と流音の合成"電子情報通信学会技術研究報告(音声研究会). 33-40 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 峯松信明: "日本語音声におけるスペクトル包絡と基本周波数間の依存性に関する定量的分析"電子情報通信学会技術研究報告(音声研究会). (発表予定). (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] 西澤信行: "ターミナルアナログ合成による高品質な流音の生成"日本音響学会研究発表会講演論文集. I. 237-238 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 桜井淳宏: "ニュートラルネットワークによるFOパターン生成過程モデルパラメータの導出"日本音響学会研究発表会講演論文集. I. 249-250 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 西澤信行: "波形編集とターミナルアナログを併用した音声合成の検討"日本音響学会研究発表会講演論文集. I. 315-316 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 江藤雅哉: "統計的手法を用いたテキストからの基本周波数パターン生成過程モデルパラメータの導出"日本音響学会研究発表会講演論文集. I. 261-262 (2000)

    • Related Report
      2000 Annual Research Report

URL: 

Published: 2000-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi