2013 Fiscal Year Annual Research Report

基本周波数パターン生成過程モデルによる統計モデリング音声合成の韻律制御の高度化

Research Project

Project/Area Number	24300068
Research Institution	The University of Tokyo
Principal Investigator	広瀬啓吉東京大学, 情報理工学(系)研究科, 教授 (50111472)
Co-Investigator(Kenkyū-buntansha)	峯松信明東京大学, 工学(系)研究科(研究院), 教授 (90273333) 齋藤大輔東京大学, 情報学環・学際情報学府, 助教 (40615150)
Project Period (FY)	2012-04-01 – 2015-03-31
Keywords	基本周波数パターン / 生成過程モデル / HMM音声合成 / 韻律制御 / 統計モデリング / 声質変換 / 焦点制御 / 中国語音声
Research Abstract	生成過程モデルの制約をHMMの学習、合成において適用することにより、音声合成の高品質化を図ると共に、指令の差分に着目することで、種々の音声変換を、少量の音声コーパスでより高精度に実現することを目的として下記の成果を達成した。 1. 学習コーパスの基本周波数(F0)パターンを生成過程モデルにより近似したものに置き換えて音素HMM を学習し、音声合成を行うことで従来より高品質が得られることを示しているが、F0パターンのうち、生成過程モデルで表現できない部分をF0差分として、HMM音声合成に組み入れることを行った。組み入れによって音質の低下はなく、有効性を確認した。生成過程モデルで表現されるF0パターンはアクセント型、フレーズ境界などと関連しているのに対し、F0差分は、音素情報などの比較的短時間の情報と関連している。F0パターンの階層構造をHMM音声合成に導入したものとして評価される。 2. 生成過程モデルにより近似されたF0パターンをHMMの学習コーパスとして利用した場合、合成の結果生成されるF0パターンは生成過程モデルに即したものになり、モデルパラメータの自動抽出が容易に高精度で行えると考えられる。実験によりこれを確認した。これにより、焦点の付与などの操作が可能となる。 3. 学習コーパスのF0パターンが大きく変動する部分は、その分析に問題があり、HMMの学習に悪影響を及ぼすと考えられる。F0パターンを生成過程モデルで近似した場合、誤差が大きくなる部分が、そのような場合に該当すると考え、音素単位で学習から除外する手法を開発し、その有効性を合成音声の聴取により+確認した。 4. 多人数を対象とした音声変換として、各話者の特徴を行列で表現して変換モデルを構築する手法を開発した。3名の間の変換を実装してその有効性を確認した。 5. 生成過程モデルの指令の違いに着目した焦点制御手法を中国語音声に適用し、階層的な焦点制御を実現した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason HMM音声合成に生成過程モデルの制約を導入し、特に韻律の観点から合成音声の品質の向上と制御の柔軟性を達成するのが、本研究の目標であるが、それに対し、学習コーパスのF0パターンを生成過程モデルで近似してモデル学習に用いる手法、合成で出力されたF0パターンを生成過程モデルで近似して再合成を行う手法を開発して、音声合成の聴取によりその有効性を確認するなど、順調に研究が進展している。前者において、合成されるF0パターンが、生成過程モデルにより容易かつ良好に近似し得ることを確認しており、HMM音声合成への生成過程モデルの制約の導入という観点からは、当初計画を上回る成果を達成している。また、F0残差をHMM音声合成で取り扱う枠組みを構築したが、これは、階層的なF0表現をHMM音声合成に導入したものと言え、今後の発展が大いに期待できる。柔軟な韻律制御については、焦点制御についてはほぼ手法が確立しているものの、意図、態度、感情については、指令の対応に対する取り組みがやや遅れている。現在進行中の複数指令の取り扱いの成果を利用して、これを解決する。中国語については、生成過程モデルの指令差分に着目した焦点制御を当初計画通りに達成している。GMM音声変換についても、行列表現による手法を実装して成果を上げている。
Strategy for Future Research Activity	生成過程モデル指令の差分に着目した韻律変換をもとに、話者変換の韻律変換手法を開発する。言語変換の場合のように、変換前と変換後の話者のパラレル音声が得られない場合についても鋭意、研究を進める。生成過程モデルの指令を2分木で推定する手法を開発しており、パラレル音声が得られる場合は、これを変換前後の音声で連結して学習し、リーフ間の対応を取ることで変換を実現することを考えているが、パラレル音声が得られない場合は、変換後の音声を複数の話者について用意した上で指令推定の二分木を構築し、バイリンガル音声で変換前後の音声の対応を取り、それをもとに二分木で推定される値の補間を取ることを行う。 F0残差をHMM音声合成に導入する過程で、メルケプストラム係数＋F0モデルによるF0とF0残差をマルチストリームで学習することを進めている。これにより、従来のmulti-space probability distribution HMMに代わり、有声・無声境界での良好な韻律制御が期待される連続F0HMMを導入することが容易になるとの感触を得ている。計画調書には記載していなかったが、音声合成の性能向上が期待され、これについても精力的に研究を進める。音声対話システム、音声翻訳システムに実装して、開発した音声合成手法の評価を行うことについては、当初の予定通り進める。
Expenditure Plans for the Next FY Research Funding	合成音声の評価実験を一部、次年度に行う。合成音声の評価実験用者謝金として支出する。

Research Products
(13 results)

All 2014 2013

All Journal Article (6 results) (of which Peer Reviewed: 4 results, Acknowledgement Compliant: 1 results) Presentation (7 results) (of which Invited: 1 results)

[Journal Article] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014
- Author(s)
  Tomoyuki Mizukami, Hiroya Hashimoto, Keikichi Hirose, Daisuke Saito, and Nobuaki Minematsu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: １ Pages: 1042-1046
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014
- Author(s)
  Ya Li, Jianhua Tao, Keikichi Hirose, Wei Lai, and Xiaoying Xu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: 1 Pages: 1032-1036
- Peer Reviewed
[Journal Article] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Proceedings of International Symposium on Frontiers of Research on Speech and Music
  
  Volume: 1 Pages: 96-100
[Journal Article] Generation of fundamental frequency contours for Thai speech using the tone nucleus model2013
- Author(s)
  Oraphan Krityakien, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Journal of Signal Processing, Research Institute of Signal Processing
  
  Volume: 16 Pages: 135-138
- Peer Reviewed
[Journal Article] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013
- Author(s)
  Hiroya Hashimoto, Keikichi Hirose and Nobuaki Minematsu
- Journal Title
  
  Proceedings 8th ISCA Workshop on Speech Synthesis
  
  Volume: 1 Pages: 35-39
- Peer Reviewed
[Journal Article] Toward flexible and systematic control of fundamental frequencies in HMM-based speech synthesis2013
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Journal of English Phonetics Society of Japan
  
  Volume: 18 Pages: 121-128
[Presentation] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014
- Author(s)
  Keikichi Hirose
- Organizer
  International Symposium on Frontiers of Research on Speech and Music
- Place of Presentation
  Mysore, India
- Year and Date
  20140313-20140314
- Invited
[Presentation] 生成過程モデルにおけるF0 パターン差分を考慮したHMM音声合成の実験的検討2014
- Author(s)
  百武恭汰
- Organizer
  日本音響学会全国大会
- Place of Presentation
  日本大学, 東京
- Year and Date
  20140310-20140312
[Presentation] 行列変量ガウス混合分布に基づく声質変換の検討2014
- Author(s)
  土井秀信
- Organizer
  日本音響学会全国大会
- Place of Presentation
  日本大学, 東京
- Year and Date
  20140310-20140312
[Presentation] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014
- Author(s)
  Ya Li
- Organizer
  International Conference on Speech Prosody
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
[Presentation] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014
- Author(s)
  Tomoyuki Mizukami
- Organizer
  International Conference on Speech Prosody
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
[Presentation] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013
- Author(s)
  Hiroya Hashimoto
- Organizer
  8th ISCA Workshop on Speech Synthesis
- Place of Presentation
  Barcelona, Spein
- Year and Date
  20130831-20130903
[Presentation] Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model2013
- Author(s)
  Keikichi Hirose
- Organizer
  INTERSPEECH 2013
- Place of Presentation
  Lyon, France
- Year and Date
  20130826-20130829

2013 Fiscal Year Annual Research Report

基本周波数パターン生成過程モデルによる統計モデリング音声合成の韻律制御の高度化

Principal Investigator

広瀬 啓吉 東京大学, 情報理工学(系)研究科, 教授 (50111472)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014

Author(s)

Journal Title

[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

Author(s)

Journal Title

[Journal Article] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014

Author(s)

Journal Title

[Journal Article] Generation of fundamental frequency contours for Thai speech using the tone nucleus model2013

Author(s)

Journal Title

[Journal Article] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013

Author(s)

Journal Title

[Journal Article] Toward flexible and systematic control of fundamental frequencies in HMM-based speech synthesis2013

Author(s)

Journal Title

[Presentation] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 生成過程モデルにおけるF0 パターン差分を考慮したHMM音声合成の実験的検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 行列変量ガウス混合分布に基づく声質変換の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model2013

Author(s)

Organizer

Place of Presentation

Year and Date

広瀬啓吉東京大学, 情報理工学(系)研究科, 教授 (50111472)