2014 Fiscal Year Annual Research Report

基本周波数パターン生成過程モデルによる統計モデリング音声合成の韻律制御の高度化

Research Project

Project/Area Number	24300068
Research Institution	The University of Tokyo
Principal Investigator	広瀬啓吉東京大学, 情報理工学(系)研究科, 教授 (50111472)
Co-Investigator(Kenkyū-buntansha)	齋藤大輔東京大学, 情報理工学(系)研究科, 助教 (40615150) 峯松信明東京大学, 工学(系)研究科(研究院), 教授 (90273333)
Project Period (FY)	2012-04-01 – 2015-03-31
Keywords	基本周波数パターン / 生成過程モデル / HMM音声合成 / 韻律制御 / Deep Neural Network / 音声変換 / 談話の焦点 / マルチストリーム
Outline of Annual Research Achievements	基本周波数パターン生成過程モデルの制約をHMMの学習と合成に適用し、高品質音声合成を達成すると共に、モデルの指令差分に着目することで、種々の音声変換を少量の音声コーパスで高精度に実現することを目的とし、以下の成果を達成した。 1. 学習音声コーパスの各サンプルの基本周波数パターンを、生成過程モデルの枠組みで、フレーズ成分、アクセント成分、残差（モデルで表現されない成分）として階層表現し、それぞれをマルチストリームとして個別にHMMの学習・合成対象とすることにより、階層表現しない従来と比べ、発話内容の言語情報と基本周波数とのより明確な対応を実現した。これにより、目標音声との基本周波数パターンの一致度、及び、合成音声の主観評価値の向上を実現した。なお、基本周波数のモデル化として、MSD-HMMが一般的であるが、有声・無声の境界での表現に問題があった。これに対して、連続F0-HMMを利用可能とした。 2．上記で得られる基本周波数パターンが、生成過程モデルにより高精度に近似し得ることを確認した。その上で、モデルの指令を制御することで、談話の焦点を少量の学習コーパスから実現した。 3．多人数話者の音声データを効率よく利用し、話者変換精度を上げる手法として、各話者の特徴を行列変量混合ガウス分布として表現した上で、変換モデルを構築する手法を開発した。これにより、従来の混合ガウス分布による固有声声質変換法を超える性能を達成した。 4．Deep Neural Networkに基づく多人数話者間の声質変換手法を開発した。1つの話者非依存サブネットワークと複数話者の話者依存サブネットワークからなる構造とすることで、話者非依存な特徴量変換と話者依存の特徴量変換とを分けて効率的な学習を実現し、従来手法を超える変換性能を達成した。 5．日本語‐中国語のプロトタイプ音声翻訳システム構築して話者性を保存した言語変換を試みることで、これまでの成果の検証を行った。
Research Progress Status	26年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	26年度が最終年度であるため、記入しない。
Causes of Carryover	26年度が最終年度であるため、記入しない。
Expenditure Plan for Carryover Budget	26年度が最終年度であるため、記入しない。

Research Products
(23 results)

All 2015 2014

All Journal Article (10 results) (of which Peer Reviewed: 10 results, Acknowledgement Compliant: 3 results) Presentation (12 results) (of which Invited: 4 results) Book (1 results)

[Journal Article] HMM音声合成を目的とした基本周波数パターン生成過程モデルのモデルパラメータ自動推定2015
- Author(s)
  橋本浩弥, 齋藤大輔, 峯松信明, 広瀬啓吉
- Journal Title
  
  電子情報通信学会論文誌
  
  Volume: J98-D Pages: 481-491
- DOI
  10.14923/transinfj.2014PDP0030
- Peer Reviewed
[Journal Article] 基本周波数パターン生成過程モデルの指令の差分に着目した発話の焦点制御2015
- Author(s)
  越智景子, 広瀬啓吉, 峯松信明
- Journal Title
  
  電子情報通信学会論文誌
  
  Volume: J98-D Pages: 524-533
- DOI
  10.14923/transinfj.2014JDP7084
- Peer Reviewed
[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014
- Author(s)
  Ya Li, Jianhua Tao, Keikichi Hirose, Wei Lai, Xiaoying Xu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: １ Pages: 1032-1036
- Peer Reviewed
[Journal Article] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014
- Author(s)
  Tomoyuki Mizukami, Hiroya Hashimoto, Keikichi Hirose, Daisuke Saito, and Nobuaki Minematsu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: １ Pages: 1042-1046
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Robust pitch estimation using ensemble empirical mode decomposition2014
- Author(s)
  Sujan Kumar Roy, Md. Khademul Islam Molla, Keikichi Hirose
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: １ Pages: 534-538
- Peer Reviewed
[Journal Article] Application of matrix variate Gaussian mixture model to statistical voice conversion2014
- Author(s)
  Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose
- Journal Title
  
  Proceedings INTERSPEECH 2014
  
  Volume: １ Pages: 2504-2508
- Peer Reviewed
[Journal Article] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Proceeedings of Forum Acusticum
  
  Volume: １ Pages: １－６
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Tensor representation for speaker characteristics in speech2014
- Author(s)
  Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
- Journal Title
  
  Proceeedings of Forum Acusticum
  
  Volume: １ Pages: １－５
- Peer Reviewed
[Journal Article] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose, Hiroya Hashimoto, Kyota Hyakutake, Daisuke Saito, Nobuaki Minematsu
- Journal Title
  
  Proceedings IEEE International Conference on Signal Processing
  
  Volume: １ Pages: 555-560
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Voice conversion based on matrix variate gaussian mixture model2014
- Author(s)
  Daisuke Saito, H. Doi, Nobuaki Minematsu, Keikichi Hirose
- Journal Title
  
  Proceedings IEEE International Conference on Signal Processing
  
  Volume: １ Pages: 567-576
- Peer Reviewed
[Presentation] 生成過程モデルによる基本周波数パターンの階層表現とHMM音声合成のマルチストリーム学習2015
- Author(s)
  島田智大
- Organizer
  日本音響学会春季講演会
- Place of Presentation
  中央大学, 文京区, 東京
- Year and Date
  2015-03-16 – 2015-03-18
[Presentation] 複数出力サブネットワークを有するディープニューラルネットワークに基づく声質変換2014
- Author(s)
  橋本哲弥
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  東京工業大学（すずかけ台）, 横浜市
- Year and Date
  2014-12-15 – 2014-12-16
[Presentation] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose
- Organizer
  IEEE International Conference on Signal Processing
- Place of Presentation
  Hangzhou, China
- Year and Date
  2014-10-19 – 2014-10-23
- Invited
[Presentation] Voice conversion based on matrix variate gaussian mixture model2014
- Author(s)
  Daisuke Saito
- Organizer
  IEEE International Conference on Signal Processing
- Place of Presentation
  Hangzhou, China
- Year and Date
  2014-10-19 – 2014-10-23
- Invited
[Presentation] Application of matrix variate Gaussian mixture model to statistical voice conversion2014
- Author(s)
  Daisuke Saito
- Organizer
  INTERSPEECH 2014
- Place of Presentation
  Changi, Singapore
- Year and Date
  2014-09-14 – 2014-09-18
[Presentation] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose
- Organizer
  Forum Acusticum 2014
- Place of Presentation
  Krakow, Poland
- Year and Date
  2014-09-07 – 2014-09-12
- Invited
[Presentation] Tensor representation for speaker characteristics in speech2014
- Author(s)
  Daisuke Saito
- Organizer
  Forum Acusticum 2014
- Place of Presentation
  Krakow, Poland
- Year and Date
  2014-09-07 – 2014-09-12
- Invited
[Presentation] 話者依存サブネットワークを用いた深層学習による多対一声質変換2014
- Author(s)
  橋本哲哉
- Organizer
  日本音響学会秋季講演会
- Place of Presentation
  北海学園大学, 札幌市
- Year and Date
  2014-09-03 – 2014-09-05
[Presentation] 行列変量正規分布の混合モデルとその声質変換への応用2014
- Author(s)
  齋藤大輔
- Organizer
  情報処理学会音声言語情報処理研究会
- Place of Presentation
  ホテル花巻, 花巻市
- Year and Date
  2014-07-24 – 2014-07-26
[Presentation] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014
- Author(s)
  Ya Li
- Organizer
  International Conference on Speech Prosody
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
[Presentation] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014
- Author(s)
  Tomoyuki Mizukami
- Organizer
  International Conference on Speech Prosody
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
[Presentation] Robust pitch estimation using ensemble empirical mode decomposition2014
- Author(s)
  Sujan Kumar Roy
- Organizer
  International Conference on Speech Prosod
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
[Book] Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis2015
- Author(s)
  Ｋｅｉｋｉｃｈｉ　Ｈｉｒｏｓｅ，　Ｊｉｎｈｕａ　Ｔａｏ　（ｅｄｉｔｏｒｓ）
- Total Pages
  213
- Publisher
  Ｓｐｒｉｎｇｅｒ－Ｖｅｒｌａｇ

2014 Fiscal Year Annual Research Report

基本周波数パターン生成過程モデルによる統計モデリング音声合成の韻律制御の高度化

Principal Investigator

広瀬 啓吉 東京大学, 情報理工学(系)研究科, 教授 (50111472)

Research Products

[Journal Article] HMM音声合成を目的とした基本周波数パターン生成過程モデルのモデルパラメータ自動推定2015

Author(s)

Journal Title

DOI

[Journal Article] 基本周波数パターン生成過程モデルの指令の差分に着目した発話の焦点制御2015

Author(s)

Journal Title

DOI

[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

Author(s)

Journal Title

[Journal Article] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014

Author(s)

Journal Title

[Journal Article] Robust pitch estimation using ensemble empirical mode decomposition2014

Author(s)

Journal Title

[Journal Article] Application of matrix variate Gaussian mixture model to statistical voice conversion2014

Author(s)

Journal Title

[Journal Article] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014

Author(s)

Journal Title

[Journal Article] Tensor representation for speaker characteristics in speech2014

Author(s)

Journal Title

[Journal Article] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014

Author(s)

Journal Title

[Journal Article] Voice conversion based on matrix variate gaussian mixture model2014

Author(s)

Journal Title

[Presentation] 生成過程モデルによる基本周波数パターンの階層表現とHMM音声合成のマルチストリーム学習2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 複数出力サブネットワークを有するディープニューラルネットワークに基づく声質変換2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Voice conversion based on matrix variate gaussian mixture model2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Application of matrix variate Gaussian mixture model to statistical voice conversion2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Tensor representation for speaker characteristics in speech2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 話者依存サブネットワークを用いた深層学習による多対一声質変換2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 行列変量正規分布の混合モデルとその声質変換への応用2014

Author(s)

Organizer

広瀬啓吉東京大学, 情報理工学(系)研究科, 教授 (50111472)