2013 Fiscal Year Research-status Report

統計的に一貫した基準に基づく声質変換手法の構築

Research Project

Project/Area Number	24700166
Research Institution	Nagoya Institute of Technology
Principal Investigator	南角吉彦名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)
Keywords	声質変換
Research Abstract	本研究の目的は、ごく少量の学習データでスペクトル情報と韻律情報、発話速度などを統一的に変換する声質変換手法を構築することである。従来の声質変換手法が音色を表すスペクトル情報のみに着目していたのに対し、提案法では、声の高さや抑揚・発話速度など、話者性が含まれるすべての情報を統一された枠組みで取り扱うため、相互の相関を利用することにより高精度な声質変換を実現することができる。また、近年、音声認識や音声合成で適用されたベイズ基準を適用し、あらかじめ収集された多量の背景データを事前情報として利用することにより、所望の声質のデータが少量しか得られない場合においても品質の高い変換音声が得られる手法を提案する。以上の目的に対し、本年度はスペクトル・基本周波数・継続長の同時変換のためのモデル構造について、理論的な枠組みを構築した。また、多量の事前データを有効に利用するためのモデル構造として、因子分析に基づくモデル構造を構築し、声質変換のためのプログラムの実装を行った。予備実験の結果から、導出した学習アルゴリズムが適切に動作していることを確認した。また、因子分析に基づいたモデルを事前分布として用いるための理論的枠組みについても検討を行った。以上のように最終的に構築する声質変換モデルのための要素技術が揃いつつあり、順調に進展している。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本研究では、スペクトル・基本周波数・発話速度の同時変換手法の確立が目的であるが、これに対して本年度は、スペクトル・基本周波数・発話速度の同時モデリングについての理論的な整備が完了し、評価実験のための基盤が整いつつある。また、多量の背景データを利用する因子分析に基づく声質変換モデルについては、理論的な整備にとどまらず、計算機上でのプログラムの実装および予備実験を実施し、適切に動作していることを確認した。このモデルは、研究目的に挙げたベイズ基準に基づく声質変換において、重要な役割を担う部分であり、最終的に構築する声質変換モデルのための要素技術が順調に揃いつつある。さらには、本研究を進めていく中で、提案法のベースとなっているガウス混合モデルに基づく声質変換において、新たな改善手法を考案し、評価実験において実際に音質が改善されることを示した。今後実施を予定している多量のデータによる評価実験においても、音声データベースの整備や実験環境の構築も順調に進行している。以上のことから、本研究は概ね順調に進展していると判断できる。
Strategy for Future Research Activity	これまでの研究は概ね計画通りに進捗しており、今後も当初の計画通りに研究を進めていく。具体的には、スペクトル・基本周波数・継続長の同時変換のためのモデルを構築し、評価実験を通じて提案手法の有効性を検証する。また、ベイズ基準による声質変換のための事前分布となるモデル構造について、昨年度、順調に理論的な検証や実装が進んでいるため、この手法を用いて多量の背景データを利用した声質変換手法を構築し、有効性を検討していく。研究としては、理論的な整備から、評価実験を行うステージに移行してきているが、評価実験によってえられた知見を適切にフィードバックしていくことにより、理論的な枠組みを改善・強化していく。本研究では、音声を複雑な統計モデルでモデル化するため、計算機による実験では膨大な計算量が必要となる。また、現段階ではリアルタイムでの動作が困難な規模の実験であっても、将来のハードウェアの高速化を考慮して、計算能力の高い計算機を用いた実験を行っていく必要がある。このために必要なワークステーションを増強する予定である。また、評価実験においては、多量の音声データや実験に必要なデータを蓄積する必要があるため、現有の設備を活かしつつ、データの蓄積装置を増強することにより対処する。次年度の研究経費は、昨年度に比べ研究成果の発表のための旅費や論文公開のための費用を増強する。研究成果は、国内外の学会（日本音響学会，ISCA　Interspeech, IEEE ICASSP等）で発表する予定である。

Research Products
(10 results)

All 2014 2013

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (8 results)

[Journal Article] Spectral modeling with contextual additive structure for HMM-based speech synthesis2014
- Author(s)
  Shinji Takaki, Yoshihiko Nankaku and Keiichi Tokuda
- Journal Title
  
  IEEE Transactions on Audio, Speech, and Language Processing
  
  Volume: Vol. 8, Issue 2 Pages: 229-238
- DOI
  10.1109/JSTSP.2014.2305919
- Peer Reviewed
[Journal Article] Integration of spectral feature extraction and modeling for HMM-based speech synthesis2014
- Author(s)
  Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda
- Journal Title
  
  IEICE TRANSACTIONS on Information & Systems
  
  Volume: vol.E97-D, no.6 Pages: 1438-1448
- Peer Reviewed
[Presentation] 表現語空間を用いた連結固有声法に基づくクロスリンガル話者適応の検討2014
- Author(s)
  佐藤雄介，中村和寛，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Year and Date
  20140310-20140312
[Presentation] GMM事後確率に基づいた重み付き変換関数による声質変換の検討2014
- Author(s)
  鶴野高輝，橋本佳，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Year and Date
  20140310-20140312
[Presentation] HMM音声合成におけるLSPに関連した特徴量表現の検討2014
- Author(s)
  有竹貴士，中村和寛，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Year and Date
  20140310-20140312
[Presentation] 低周波数標本化音声データの高帯域成分復元を考慮したメルケプストラム分析の検討2014
- Author(s)
  中村和寛，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会,
- Place of Presentation
  日本大学（駿河台キャンパス）
- Year and Date
  20140310-20140312
[Presentation] 状態レベルのコンテキストを用いたHMM音声合成の検討2014
- Author(s)
  大浦圭一郎，橋本佳，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Year and Date
  20140310-20140312
[Presentation] Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis2013
- Author(s)
  Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku and Keiichi Tokuda
- Organizer
  ISCA Speech Synthesis Workshop(SSW8)
- Place of Presentation
  Barcelona, Spain
- Year and Date
  20130831-20130902
[Presentation] Contextual partial additive structure for HMM-based speech synthesis2013
- Author(s)
  Shinji Takaki, Yoshihiko Nankaku and Keiichi Tokuda
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013)
- Place of Presentation
  Vancouver, Canada
- Year and Date
  20130526-20130531
[Presentation] Integration of acoustic modeling and mel-cepstral analysis for HMM-based speech synthesis2013
- Author(s)
  Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013)
- Place of Presentation
  Vancouver, Canada
- Year and Date
  20130526-20130531

2013 Fiscal Year Research-status Report

統計的に一貫した基準に基づく声質変換手法の構築

Principal Investigator

南角 吉彦 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Spectral modeling with contextual additive structure for HMM-based speech synthesis2014

Author(s)

Journal Title

DOI

[Journal Article] Integration of spectral feature extraction and modeling for HMM-based speech synthesis2014

Author(s)

Journal Title

[Presentation] 表現語空間を用いた連結固有声法に基づくクロスリンガル話者適応の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] GMM事後確率に基づいた重み付き変換関数による声質変換の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] HMM音声合成におけるLSPに関連した特徴量表現の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 低周波数標本化音声データの高帯域成分復元を考慮したメルケプストラム分析の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 状態レベルのコンテキストを用いたHMM音声合成の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Contextual partial additive structure for HMM-based speech synthesis2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Integration of acoustic modeling and mel-cepstral analysis for HMM-based speech synthesis2013

Author(s)

Organizer

Place of Presentation

Year and Date

南角吉彦名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)