2016 Fiscal Year Annual Research Report

ガウス過程回帰に基づく音声合成技術の確立

Research Project

Project/Area Number	15H02724
Research Institution	Tokyo Institute of Technology
Principal Investigator	小林隆夫東京工業大学, 工学院, 教授 (70153616)
Co-Investigator(Kenkyū-buntansha)	郡山知樹東京工業大学, 工学院, 助教 (50749124)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	音声情報処理 / テキスト音声合成 / 韻律生成 / GPR音声合成
Outline of Annual Research Achievements	テキスト音声合成の新たな枠組みであるガウス過程回帰（GPR）に基づく手法において，研究第二年度となる本年度は多様な音声合成への応用に重点をおいて研究を進め，以下の成果が得られた。まずGPR音声合成手法に関して，学習用音声データに自動的に韻律情報を付加する手法を提案し，これに基づいてラベリングした音声データを用いてモデル学習を行い，GPR音声合成の品質評価を行った結果，同じ学習データを用いた従来の合成手法に比べて品質の高い合成音声が得られることを示した。次に，GPR音声合成に基づいた多様な話者性による音声合成に関しては，複数話者の音声データと特徴量空間における線形変換に基づいた話者適応を利用する提案手法において，複数の線形変換を組み合わせることにより合成音声の品質が向上することを示した。さらに，多様な話者性に加えてGPR音声合成に基づいた多様な発話様式や感情表現を持つ音声の生成に関して，特徴量変換を利用するスタイル適応手法を検討し，少量のスタイル学習用音声からでも所望のスタイルを持った音声を合成可能なことを示した。また，学習用音声として多様なスタイル音声が含まれるオーディオブック音声を用いたGPR音声合成の検討を行い，従来法に比べ品質が向上することを示した。ユニバーサルコミュニケーションに向けた音声合成として日本語音声合成の他に、韻律生成が難しい声調言語の一つであるタイ語について，音韻継続長に関する新たなGPR音声合成用モデル化手法を提案し，その有効性を示した。また，英語音声合成についてもGPR音声合成システムを構築した。この他にも，音声インタフェースの発展に資する基盤技術として，パラレルデータを用いない声質変換手法の検討を行った。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本研究の目的は，より自然で多様な音声の合成を可能とするために，新たな音声合成の枠組みであるガウス過程回帰に基づく音声合成（GPR音声合成）手法を提案し，その基盤技術を確立することにあり，研究第二年度では初年度に構築したGPR音声合成システムの多様な音声合成への応用に重点をおいて研究を進めた。この観点からすると，多様な話者性によるGPR音声合成，多様なスタイルによるGPR音声合成，ユニバーサルコミュニケーションに向けたタイ語や英語のGPR音声合成手法を提案し，その客観・主観評価結果から，従来のHMM音声合成システムの性能を有意に上回ることを示したことから，第二年度の目標は十分に達成できたことから，概ね順調に進んでいると判断した。
Strategy for Future Research Activity	研究第二年度の研究は計画通り進んでおり，ユニバーサルコミュニケーションに向けた他言語の音声合成としてインドネシア語から英語に変更したことを除いて，計画の大きな変更は不要なことから，概ね当初の計画に沿ってこのまま研究を進めていく。

Research Products
(17 results)

All 2017 2016

All Journal Article (9 results) (of which Peer Reviewed: 4 results, Open Access: 2 results, Acknowledgement Compliant: 9 results) Presentation (8 results) (of which Int'l Joint Research: 3 results)

[Journal Article] Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis2017
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
  
  Volume: － Pages: 5945-5948
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] クロスリンガル音声合成のための共有決定木コンテクストクラスタリングを用いた話者適応2017
- Author(s)
  長濱大樹, 能勢隆, 郡山知樹, 小林隆夫
- Journal Title
  
  電子情報通信学会論文誌 D
  
  Volume: J100-D Pages: 385-393
- DOI
  10.14923/transinfj.2016PDP0020
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] アクセント情報自動ラベリングの音声合成品質への影響に関する検討2017
- Author(s)
  増子理菜, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2017年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 283-284
- Acknowledgement Compliant
[Journal Article] GPR音声合成に基づいたオーディオブック音声の合成2017
- Author(s)
  津野駿幸, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2017年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 295-296
- Acknowledgement Compliant
[Journal Article] コンテキストを考慮した音素マッチングに基づく非パラレルデータGMM声質変換2017
- Author(s)
  高橋亮, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2017年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 367-368
- Acknowledgement Compliant
[Journal Article] Tone modeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. the 8th International Conference on Speech Prosody (SPEECH PROSODY 2016)
  
  Volume: － Pages: 1014-1018
- DOI
  10.21437/SpeechProsody.2016-208
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Unsupervised stress information labeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016)
  
  Volume: － Pages: 1591-1595
- DOI
  10.21437/Interspeech.2016-273
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] GPR音声合成における区分線形特徴量変換を用いたスタイル適応の検討2016
- Author(s)
  前野雄也, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2016年秋季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 213-214
- Acknowledgement Compliant
[Journal Article] 非パラレルデータを用いるGMM声質変換の検討2016
- Author(s)
  高橋亮, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2016年秋季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 267-268
- Acknowledgement Compliant
[Presentation] アクセント情報自動ラベリングの音声合成品質への影響に関する検討2017
- Author(s)
  増子理菜
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] GPR音声合成に基づいたオーディオブック音声の合成2017
- Author(s)
  津野駿幸
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] コンテキストを考慮した音素マッチングに基づく非パラレルデータGMM声質変換2017
- Author(s)
  高橋亮
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis2017
- Author(s)
  Decha Moungsri
- Organizer
  2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
- Place of Presentation
  ヒルトンニューオーリンズリバーサイド（米国）
- Year and Date
  2017-03-05 – 2017-03-09
- Int'l Joint Research
[Presentation] GPR音声合成における区分線形特徴量変換を用いたスタイル適応の検討2016
- Author(s)
  前野雄也
- Organizer
  日本音響学会2016年秋季研究発表会
- Place of Presentation
  富山大学（富山県富山市）
- Year and Date
  2016-09-14 – 2016-09-16
[Presentation] 非パラレルデータを用いるGMM声質変換の検討2016
- Author(s)
  高橋亮
- Organizer
  日本音響学会2016年秋季研究発表会
- Place of Presentation
  富山大学（富山県富山市）
- Year and Date
  2016-09-14 – 2016-09-16
[Presentation] Unsupervised stress information labeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri
- Organizer
  17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
- Place of Presentation
  ハイアットリージェンシーサンフランシスコ（米国）
- Year and Date
  2016-09-08 – 2016-09-12
- Int'l Joint Research
[Presentation] Tone modeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri
- Organizer
  the 8th International Conference on Speech Prosody, SPEECH PROSODY 2016
- Place of Presentation
  ボストン大学（米国）
- Year and Date
  2016-05-31 – 2016-06-03
- Int'l Joint Research

2016 Fiscal Year Annual Research Report

ガウス過程回帰に基づく音声合成技術の確立

Principal Investigator

小林 隆夫 東京工業大学, 工学院, 教授 (70153616)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis2017

Author(s)

Journal Title

[Journal Article] クロスリンガル音声合成のための共有決定木コンテクストクラスタリングを用いた話者適応2017

Author(s)

Journal Title

DOI

[Journal Article] アクセント情報自動ラベリングの音声合成品質への影響に関する検討2017

Author(s)

Journal Title

[Journal Article] GPR音声合成に基づいたオーディオブック音声の合成2017

Author(s)

Journal Title

[Journal Article] コンテキストを考慮した音素マッチングに基づく非パラレルデータGMM声質変換2017

Author(s)

Journal Title

[Journal Article] Tone modeling using Gaussian process latent variable model for statistical speech synthesis2016

Author(s)

Journal Title

DOI

[Journal Article] Unsupervised stress information labeling using Gaussian process latent variable model for statistical speech synthesis2016

Author(s)

Journal Title

DOI

[Journal Article] GPR音声合成における区分線形特徴量変換を用いたスタイル適応の検討2016

Author(s)

Journal Title

[Journal Article] 非パラレルデータを用いるGMM声質変換の検討2016

Author(s)

Journal Title

[Presentation] アクセント情報自動ラベリングの音声合成品質への影響に関する検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] GPR音声合成に基づいたオーディオブック音声の合成2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] コンテキストを考慮した音素マッチングに基づく非パラレルデータGMM声質変換2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] GPR音声合成における区分線形特徴量変換を用いたスタイル適応の検討2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 非パラレルデータを用いるGMM声質変換の検討2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Unsupervised stress information labeling using Gaussian process latent variable model for statistical speech synthesis2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Tone modeling using Gaussian process latent variable model for statistical speech synthesis2016

Author(s)

Organizer

Place of Presentation

Year and Date

小林隆夫東京工業大学, 工学院, 教授 (70153616)