自然なヒューマンコンピュータインタラクションのための話し言葉会話音声合成

Research Project

Project/Area Number	13J08776
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Media informatics/Database
Research Institution	Tokyo Institute of Technology
Principal Investigator	郡山知樹東京工業大学, 総合理工学研究科, 助教
Project Period (FY)	2013-04-01 – 2015-03-31
Project Status	Completed (Fiscal Year 2014)
Budget Amount *help	¥2,300,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥300,000) Fiscal Year 2014: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2013: ¥1,000,000 (Direct Cost: ¥1,000,000)
Keywords	音声合成 / 話し言葉 / 統計的機械学習 / ガウス過程回帰 / 隠れマルコフモデル / 統計的音声合成 / ノンパラメトリックベイズ
Outline of Annual Research Achievements	今日まで音声合成で広く研究の対象となっていたものは朗読調・アナウンス調の音声であった。近年の研究により、比較的低コストで楽しげや怒りといった感情表現・発話様式を再現することが可能であることが報告されているが、日常会話で用いられるような話し言葉調の自然な音声を合成できるまでには至っていない。その理由として、自発的な会話音声に含まれる疑問や確認などの発話意図や「ああ」「うん」などのフィラーといった多様な表現を実現するための、データベースの構築・音声の説明変数の選択・モデル化手法に対して十分な検討が行われていないという問題点がある。研究代表者は隠れマルコフモデルに基づく音声合成（HMM音声合成）を話し言葉調の音声に適用する手法として、従来手法である音素単位のモデル化に対し、疑問に含まれる上昇調などの韻律的イベントを単位とするモデル化手法を提案した。しかし、HMM音声合成における状態単位のモデル化という制約から自然な話し言葉会話音声の生成には至らなかった。そこで、本研究ではHMMのように状態単位ではなくフレーム単位で音声をモデル化するガウス過程回帰に基づく新たな音声合成手法（GPR音声合成）を提案した。読み上げ調の音声に対し、音声の音韻を表すスペクトルおよび韻律を表すF0のモデル化を行い、従来のHMM音声合成に比べ自然性の高い音声を合成できることを示した。GPR音声合成は柔軟性の高い手法であり、話し言葉に特有の入力変数の導入が容易であることから、今後話し言葉会話音声における自然性の向上に繋がると考えられる。
Research Progress Status	26年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	26年度が最終年度であるため、記入しない。

Report

(2 results)

2014 Annual Research Report
2013 Annual Research Report

Research Products
(9 results)

All 2015 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Acknowledgement Compliant: 1 results) Presentation (8 results)

[Journal Article] Statistical Parametric Speech Synthesis Based on Gaussian Process Regression2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  IEEE Journal of Selected Topics in Signal Processing
  
  Volume: 8 Issue: 2 Pages: 173-183
- DOI
  10.1109/jstsp.2013.2283461
- Related Report
  2014 Annual Research Report 2013 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Presentation] Prosody Generation Using Frame-based Gaussian Process Regression and Classification for Statistical Parametric Speech Synthesis2015
- Author(s)
  Tomoki Koriyama, Takao Kobayashi
- Organizer
  ICASSP 2015
- Place of Presentation
  Brisbane Convention & Exhibition Centre, Brisbane, Australia
- Year and Date
  2015-04-19 – 2015-04-24
- Related Report
  2014 Annual Research Report
[Presentation] Parametric Speech Synthesis Using Local and Global Sparse Gaussian Processes2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Organizer
  The 24th IEEE International Workshop on Machine Learning for Signal Processing
- Place of Presentation
  Reims Centre des Congres, Reims, France
- Year and Date
  2014-09-21 – 2014-09-24
- Related Report
  2014 Annual Research Report
[Presentation] Parametric Speech Synthesis Based on Gaussian Process Regression Using Global Variance and Hyperparameter Optimization2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Organizer
  ICASSP 2014
- Place of Presentation
  Fortezza dal Basso, Florence, Italy
- Year and Date
  2014-05-04 – 2014-05-09
- Related Report
  2014 Annual Research Report
[Presentation] 系列内変動を考慮したガウス過程回帰に基づく音声パラメータ生成2014
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2014年春期研究発表会
- Place of Presentation
  日本大学理工学部駿河台キヤンパス
- Related Report
  2013 Annual Research Report
[Presentation] ガウス過程回帰に基づく音声合成におけるハイパーパラメータ最適化の検討2014
- Author(s)
  郡山知樹
- Organizer
  電子情報通信学会1月音声研究会
- Place of Presentation
  名城大学天白キャンパス
- Related Report
  2013 Annual Research Report
[Presentation] Frame-level Acoustic Modeling Based on Gaussian Process Regression for Statistical Nonparametric Speech Synthesis2013
- Author(s)
  郡山知樹
- Organizer
  The 38^<th> International Conference on Acoustics, Speech, and Signal Processing
- Place of Presentation
  Vancouver Convention & Exhibition Centre, Canada
- Related Report
  2013 Annual Research Report
[Presentation] Statistical nonparametric speech synthesis using sparse Gaussian processes2013
- Author(s)
  郡山知樹
- Organizer
  14^<th> Annual Conference of the International Speech Communication Association
- Place of Presentation
  Lyon Convention Center, France
- Related Report
  2013 Annual Research Report
[Presentation] スパース近似と畳み込みカーネルを用いたガウス過程回帰に基づく音声合成2013
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2013年秋期研究発表会
- Place of Presentation
  豊橋技術科学大学
- Related Report
  2013 Annual Research Report

自然なヒューマンコンピュータインタラクションのための話し言葉会話音声合成

Principal Investigator

郡山 知樹 東京工業大学, 総合理工学研究科, 助教

¥2,300,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥300,000)

Report

Research Products

[Journal Article] Statistical Parametric Speech Synthesis Based on Gaussian Process Regression2014

Author(s)

Journal Title

DOI

Related Report

[Presentation] Prosody Generation Using Frame-based Gaussian Process Regression and Classification for Statistical Parametric Speech Synthesis2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Parametric Speech Synthesis Using Local and Global Sparse Gaussian Processes2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Parametric Speech Synthesis Based on Gaussian Process Regression Using Global Variance and Hyperparameter Optimization2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 系列内変動を考慮したガウス過程回帰に基づく音声パラメータ生成2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] ガウス過程回帰に基づく音声合成におけるハイパーパラメータ最適化の検討2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Frame-level Acoustic Modeling Based on Gaussian Process Regression for Statistical Nonparametric Speech Synthesis2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Statistical nonparametric speech synthesis using sparse Gaussian processes2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] スパース近似と畳み込みカーネルを用いたガウス過程回帰に基づく音声合成2013

Author(s)

Organizer

Place of Presentation

Related Report

郡山知樹東京工業大学, 総合理工学研究科, 助教