2013 Fiscal Year Research-status Report

多様で肉声感の高い音声生成のための素片正規化に基づくハイブリッド音声合成の研究

Research Project

Project/Area Number	25730106
Research Category	Grant-in-Aid for Young Scientists (B)
Research Institution	Tohoku University
Principal Investigator	能勢隆東北大学, 工学(系)研究科(研究院), 講師 (90550591)
Project Period (FY)	2013-04-01 – 2015-03-31
Keywords	音声合成 / ハイブリッド / 高品質 / 多様化
Research Abstract	本研究では、限られた音声資源のみで自然かつ高品質な音声を合成することを目的としており、本年度は以下の項目について成果が得られた。(1)評価および比較実験に必要な長時間音声データの収録準備：5時間程度の音声を収録するため、収録文を用意・作成した。作成には目標話者の話し方に応じたキャラクター依存文も用意した。これにより、従来の朗読調の文のみでなくキャラクターの個性をもった文を用いることでより個人性を向上できると期待できる。(2)提案法であるハイブリッド音声合成の理論的枠組：素片の正規化法について基本的な理論の構築を行った。具体的にはモデル適応の枠組みを利用し、線形変換により素片の音声パラメータを変換することで素片間のばらつきが低減できると期待される。(3)ハイブリッド音声合成の予備的検討：少量の音声データについて予備的な実験を行い、動作を確認した。(4)音声の多様化手法の検討：関連研究として、音声および歌声におけるスタイルや言語の多様化手法を検討した。音声については不特定話者に対するスタイル変換法を提案した。これにより任意の話者の読み上げ音声のみでその話者が感情表現や発話様式をもった音声を生成できることを示した。歌声については重回帰モデルによる歌唱スタイル制御法を提案し、歌声合成において歌唱スタイルを直観的に制御できることを示した。言語の多様化については、任意の話者の母国語音声のみで外国語音声を合成できるクロスリンガル音声合成を提案した。
Current Status of Research Progress	Current Status of Research Progress 3: Progress in research has been slightly delayed. Reason 提案法の評価、従来法との比較には長時間の音声データが必要であるが、その収録の準備に時間がかかったため、音声の収録および提案法の予備的検討に留まった。
Strategy for Future Research Activity	時間がかかる作業であった音声収録の準備を完了することができたので、本年度は音声の収録、理論の確立、実装および評価を予定どおり行っていく。
Expenditure Plans for the Next FY Research Funding	予定していた収録の準備に時間がかかり、音声収録を完了することができず、その経費を次年度に回したため。音声収録に必要な経費として、平成２６年度請求額とあわせて使用する予定である。

Research Products
(13 results)

All 2014 2013 Other

All Journal Article (7 results) (of which Peer Reviewed: 4 results, Acknowledgement Compliant: 1 results) Presentation (6 results)

[Journal Article] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis2014
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Journal Title
  
  Speech Communication
  
  Volume: 57 Pages: 144-154
- DOI
  10.1016/j.specom.2013.09.014
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] 共有決定木を利用した話者適応に基づくクロスリンガル音声合成の評価2014
- Author(s)
  長濱大樹, 能勢隆, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2014年春季研究発表会講演論文集
  
  Volume: vol.1 Pages: 413-414
[Journal Article] 音声合成のための音韻・韻律コンテキストを考慮した文選択アルゴリズムの評価2014
- Author(s)
  荒生侑介, 能勢隆, 郡山知樹, 篠崎隆宏, 小林隆夫
- Journal Title
  
  日本音響学会2014年春季研究発表会講演論文集
  
  Volume: vol.1 Pages: 405-406
[Journal Article] Speaker-independent style conversion for HMM-based expressive speech synthesis2013
- Author(s)
  Hiroki Kanagawa, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proc. 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
  
  Volume: vol.1 Pages: 7864-7867
- Peer Reviewed
[Journal Article] HMM-based expressive speech synthesis based on phrase-level F0 context labeling2013
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Journal Title
  
  Proc. 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
  
  Volume: vol.1 Pages: 7859-7863
- Peer Reviewed
[Journal Article] A style control technique for singing voice synthesis based on multiple-regression HSMM2013
- Author(s)
  Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
  
  Volume: vol.1 Pages: 378-382
- Peer Reviewed
[Journal Article] 複数ドメインコーパスからの文選択に基づくキャラクター音声合成の検討2013
- Author(s)
  荒生侑介, 能勢隆, 篠崎隆宏, 小林隆夫
- Journal Title
  
  日本音響学会2013年秋季研究発表会講演論文集
  
  Volume: vol.1 Pages: 351-352
[Presentation] Speaker-independent style conversion for HMM-based expressive speech synthesis
- Author(s)
  Hiroki Kanagawa
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
- Place of Presentation
  Vancouver, Canada
[Presentation] HMM-based expressive speech synthesis based on phrase-level F0 context labeling
- Author(s)
  Yu Maeno
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
- Place of Presentation
  Vancouver, Canada
[Presentation] A style control technique for singing voice synthesis based on multiple-regression HSMM
- Author(s)
  Takashi Nose
- Organizer
  14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
- Place of Presentation
  Lyon, France
[Presentation] 複数ドメインコーパスからの文選択に基づくキャラクター音声合成の検討
- Author(s)
  荒生侑介
- Organizer
  日本音響学会2013年秋季研究発表会
- Place of Presentation
  豊橋技術科学大学
[Presentation] 共有決定木を利用した話者適応に基づくクロスリンガル音声合成の評価
- Author(s)
  長濱大樹
- Organizer
  日本音響学会2014年春季研究発表会
- Place of Presentation
  日本大学
[Presentation] 音声合成のための音韻・韻律コンテキストを考慮した文選択アルゴリズムの評価
- Author(s)
  荒生侑介
- Organizer
  日本音響学会2014年春季研究発表会
- Place of Presentation
  日本大学

2013 Fiscal Year Research-status Report

多様で肉声感の高い音声生成のための素片正規化に基づくハイブリッド音声合成の研究

Principal Investigator

能勢 隆 東北大学, 工学(系)研究科(研究院), 講師 (90550591)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis2014

Author(s)

Journal Title

DOI

[Journal Article] 共有決定木を利用した話者適応に基づくクロスリンガル音声合成の評価2014

Author(s)

Journal Title

[Journal Article] 音声合成のための音韻・韻律コンテキストを考慮した文選択アルゴリズムの評価2014

Author(s)

Journal Title

[Journal Article] Speaker-independent style conversion for HMM-based expressive speech synthesis2013

Author(s)

Journal Title

[Journal Article] HMM-based expressive speech synthesis based on phrase-level F0 context labeling2013

Author(s)

Journal Title

[Journal Article] A style control technique for singing voice synthesis based on multiple-regression HSMM2013

Author(s)

Journal Title

[Journal Article] 複数ドメインコーパスからの文選択に基づくキャラクター音声合成の検討2013

Author(s)

Journal Title

[Presentation] Speaker-independent style conversion for HMM-based expressive speech synthesis

Author(s)

Organizer

Place of Presentation

[Presentation] HMM-based expressive speech synthesis based on phrase-level F0 context labeling

Author(s)

Organizer

Place of Presentation

[Presentation] A style control technique for singing voice synthesis based on multiple-regression HSMM

Author(s)

Organizer

Place of Presentation

[Presentation] 複数ドメインコーパスからの文選択に基づくキャラクター音声合成の検討

Author(s)

Organizer

Place of Presentation

[Presentation] 共有決定木を利用した話者適応に基づくクロスリンガル音声合成の評価

Author(s)

Organizer

Place of Presentation

[Presentation] 音声合成のための音韻・韻律コンテキストを考慮した文選択アルゴリズムの評価

Author(s)

Organizer

Place of Presentation

能勢隆東北大学, 工学(系)研究科(研究院), 講師 (90550591)