2014 Fiscal Year Annual Research Report

ロバスト音声合成の深化と多言語音声コミュニケーションへの展開

Research Project

Project/Area Number	24300071
Research Institution	Tokyo Institute of Technology
Principal Investigator	小林隆夫東京工業大学, 総合理工学研究科(研究院), 教授 (70153616)
Co-Investigator(Kenkyū-buntansha)	能勢隆東北大学, 工学(系)研究科(研究院), 講師 (90550591)
Project Period (FY)	2012-04-01 – 2015-03-31
Keywords	テキスト音声合成 / HMM音声合成 / 基本周波数正規化学習 / 韻律ラベリング / クロスリンガル音声合成 / 国際情報交換（インドネシア）
Outline of Annual Research Achievements	ロバスト音声合成技術の深化・発展を目指し、第二年度まで得られた成果を基に研究を進め、基本技術に関する理論的検討、提案手法の高度化を行うと共に、ロバスト音声合成の多言語への応用に関してタイ語、インドネシア語、英語について検討を行い、以下の成果が得られた。１．表現性にロバストな音声合成法：日本語合成音声のアクセント型の誤りを減らすために、アクセント型高低パタンに基づく基本周波数正規化学習の検討を行い、評価実験を通して有効性を示した。また、統計的音声合成手法の新たなアプローチであるガウス過程回帰に基づく音声合成手法に関して、従来の隠れマルコフモデルに基づく手法で有効性が示されている系列内変動と動的特徴量を提案手法にも導入することにより、性能がさらに改善できることを示した。２．自発音声・会話音声の合成：品質の高い合成音声を得るためには、適切にコンテキストラベリングされた学習データが必要であるが、人手による正確なラベリング作業は高コストになる問題がある。これに対し、日本語文音声音声のアクセント句・アクセント型に関する自動韻律ラベリング手法を提案し、手動ラベリングと同等な合成音声が得られることを示した。３．音声資源が乏しい言語の音声合成：タイ語音声合成において、トーン（声調）の再現性向上に有効なストレスコンテキストを自動でラベリングする手法を提案してその有効性を示した。また、インドネシア語音声合成に対しては、前年度に収録した男女各1名の音声データを用いて、プロトタイプ音声合成システムを検討した。４．多言語の音声合成：共有決定木を利用した話者適応に基づくクロスリンガル音声合成手法について、英語・日本語を対象としたシステムのより詳細な検討を行った。さらに、モデルの性能を向上するために、新たな英語・日本語バイリンガル音声データの収録とラベリングを行った。
Research Progress Status	26年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	26年度が最終年度であるため、記入しない。
Causes of Carryover	26年度が最終年度であるため、記入しない。
Expenditure Plan for Carryover Budget	26年度が最終年度であるため、記入しない。

Research Products
(24 results)

All 2015 2014

All Journal Article (13 results) (of which Peer Reviewed: 9 results, Acknowledgement Compliant: 13 results, Open Access: 3 results) Presentation (11 results)

[Journal Article] ガウス過程回帰に基づく音声合成システムの検討2015
- Author(s)
  郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2015年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 269-270
- Acknowledgement Compliant
[Journal Article] 言語モデルと音響モデルを用いた自動韻律ラベリングの評価2015
- Author(s)
  増子理菜, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2015年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 361-362
- Acknowledgement Compliant
[Journal Article] ガウス過程回帰に基づく音声合成のためのコンテキストの検討2015
- Author(s)
  岡元伶洋, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2015年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 371-372
- Acknowledgement Compliant
[Journal Article] Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis2015
- Author(s)
  Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proceedings of 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing
  
  Volume: ICASSP 2015 Pages: 4929-4933
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Statistical parametric speech synthesis based on Gaussian process regression2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  IEEE Journal of Selected Topics in Signal Processing
  
  Volume: 8 Pages: 173-183
- DOI
  10.1109/JSTSP.2013.2283461
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] A parameter generation algorithm using local variance for HMM-based speech synthesis2014
- Author(s)
  Takashi Nose, Vataya Chunwijitra, Takao Kobayashi
- Journal Title
  
  IEEE Journal of Selected Topics in Signal Processing
  
  Volume: 8 Pages: 221-228
- DOI
  10.1109/JSTSP.2013.2283459
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing
  
  Volume: ICASSP 2014 Pages: 3862-3866
- DOI
  10.1109/ICASSP.2014.6854319
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Tone modeling using stress information for HMM-based Thai speech synthesis2014
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of the 7th International Conference on Speech Prosody
  
  Volume: SPEECHPROSODY 7 Pages: 1057-1061
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis2014
- Author(s)
  Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proceedings of the 15th Annual Conference of the International Speech Communication Association
  
  Volume: INTERSPEECH 2014 Pages: 770-774
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling2014
- Author(s)
  Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi
- Journal Title
  
  Proceedings of the 15th Annual Conference of the International Speech Communication Association
  
  Volume: INTERSPEECH 2014 Pages: 2337-2341
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Parametric speech synthesis using local and global sparse Gaussian processes2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of IEEE International Workshop on Machine Learning for Signal Processing
  
  Volume: MLSP 2014 Pages: 1-6
- DOI
  10.1109/MLSP.2014.6958921
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] HMM-based Thai speech synthesis using unsupervised stress context labeling2014
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proceedings of 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
  
  Volume: APSIPA ASC 2014 Pages: 1-4
- DOI
  10.1109/APSIPA.2014.7041599
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] ガウス過程回帰に基づくF0パタン生成の検討2014
- Author(s)
  郡山知樹, 能勢隆, 小林隆夫
- Journal Title
  
  日本音響学会2014年秋季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 247-248
- Acknowledgement Compliant
[Presentation] Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis2015
- Author(s)
  Tomoki Koriyama
- Organizer
  2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
- Place of Presentation
  Brisbane Convention & Exhibition Centre（オーストラリア）
- Year and Date
  2015-04-19 – 2015-04-24
[Presentation] 言語モデルと音響モデルを用いた自動韻律ラベリングの評価2015
- Author(s)
  増子理菜
- Organizer
  日本音響学会2015年春季研究発表会
- Place of Presentation
  中央大学後楽園キャンパス（東京）
- Year and Date
  2015-03-16 – 2015-03-18
[Presentation] ガウス過程回帰に基づく音声合成システムの検討2015
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2015年春季研究発表会
- Place of Presentation
  中央大学後楽園キャンパス（東京）
- Year and Date
  2015-03-16 – 2015-03-18
[Presentation] ガウス過程回帰に基づく音声合成のためのコンテキストの検討2015
- Author(s)
  岡元伶洋
- Organizer
  日本音響学会2015年春季研究発表会
- Place of Presentation
  中央大学後楽園キャンパス（東京）
- Year and Date
  2015-03-16 – 2015-03-18
[Presentation] HMM-based Thai speech synthesis using unsupervised stress context labeling2014
- Author(s)
  Decha Moungsri
- Organizer
  2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2014
- Place of Presentation
  Sokha Angkor Resort（カンボジア）
- Year and Date
  2014-12-09 – 2014-12-12
[Presentation] Parametric speech synthesis using local and global sparse Gaussian processes2014
- Author(s)
  Tomoki Koriyama
- Organizer
  International Workshop on Machine Learning for Signal Processing, MLSP2014
- Place of Presentation
  Reims Centre De Congres（フランス）
- Year and Date
  2014-09-21 – 2014-09-24
[Presentation] Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis2014
- Author(s)
  Daiki Nagahama
- Organizer
  The 15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014
- Place of Presentation
  Singapore Expo（シンガポール）
- Year and Date
  2014-09-14 – 2014-09-18
[Presentation] Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling2014
- Author(s)
  Tomoki Koriyama
- Organizer
  The 15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014
- Place of Presentation
  Singapore Expo（シンガポール）
- Year and Date
  2014-09-14 – 2014-09-18
[Presentation] ガウス過程回帰に基づくF0パタン生成の検討2014
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2014年秋季研究発表会
- Place of Presentation
  北海学園大学豊平キャンパス
- Year and Date
  2014-09-03 – 2014-09-05
[Presentation] Tone modeling using stress information for HMM-based Thai speech synthesis2014
- Author(s)
  Decha Moungsri
- Organizer
  The 7th International Conference on Speech Prosody, SPEECHPROSODY 7
- Place of Presentation
  トリニティカレッジ（アイルランド）
- Year and Date
  2014-05-20 – 2014-05-23
[Presentation] Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization2014
- Author(s)
  Tomoki Koriyama
- Organizer
  2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
- Place of Presentation
  "Fortezza Da Basso” Convention & Exhibition Centre （イタリア）
- Year and Date
  2014-05-04 – 2014-05-09

2014 Fiscal Year Annual Research Report

ロバスト音声合成の深化と多言語音声コミュニケーションへの展開

Principal Investigator

小林 隆夫 東京工業大学, 総合理工学研究科(研究院), 教授 (70153616)

Research Products

[Journal Article] ガウス過程回帰に基づく音声合成システムの検討2015

Author(s)

Journal Title

[Journal Article] 言語モデルと音響モデルを用いた自動韻律ラベリングの評価2015

Author(s)

Journal Title

[Journal Article] ガウス過程回帰に基づく音声合成のためのコンテキストの検討2015

Author(s)

Journal Title

[Journal Article] Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis2015

Author(s)

Journal Title

[Journal Article] Statistical parametric speech synthesis based on Gaussian process regression2014

Author(s)

Journal Title

DOI

[Journal Article] A parameter generation algorithm using local variance for HMM-based speech synthesis2014

Author(s)

Journal Title

DOI

[Journal Article] Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization2014

Author(s)

Journal Title

DOI

[Journal Article] Tone modeling using stress information for HMM-based Thai speech synthesis2014

Author(s)

Journal Title

[Journal Article] Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis2014

Author(s)

Journal Title

[Journal Article] Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling2014

Author(s)

Journal Title

[Journal Article] Parametric speech synthesis using local and global sparse Gaussian processes2014

Author(s)

Journal Title

DOI

[Journal Article] HMM-based Thai speech synthesis using unsupervised stress context labeling2014

Author(s)

Journal Title

DOI

[Journal Article] ガウス過程回帰に基づくF0パタン生成の検討2014

Author(s)

Journal Title

[Presentation] Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 言語モデルと音響モデルを用いた自動韻律ラベリングの評価2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] ガウス過程回帰に基づく音声合成システムの検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] ガウス過程回帰に基づく音声合成のためのコンテキストの検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] HMM-based Thai speech synthesis using unsupervised stress context labeling2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Parametric speech synthesis using local and global sparse Gaussian processes2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis2014

小林隆夫東京工業大学, 総合理工学研究科(研究院), 教授 (70153616)