Development of statistically consistent voice conversion techniques based on joint feature modeling

Research Project

Project/Area Number	24700166
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perception information processing/Intelligent robotics
Research Institution	Nagoya Institute of Technology
Principal Investigator	NANKAKU Yoshihiko 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)
Project Period (FY)	2012-04-01 – 2015-03-31
Project Status	Completed (Fiscal Year 2014)
Budget Amount *help	¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000) Fiscal Year 2014: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2013: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2012: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords	声質変換 / 国際情報交換
Outline of Final Research Achievements	This project aimed to improve voice cconversion techniques which convert speech waveforms from original speaker's voice to another speaker's one. In conventional voice conversion technique, spectral features and prosodic features such as fundamental frequencies (F0) and speaking rates are indenedently converted. In the proposed technique, these features are consistently modeled using a single statistical model and all features are jointly converted using the correlation among features. Experimental results showed that the speech quality of converted voices was improved by the proposed technique. Moreover, the project also developed a technique to improve voice conversion with a very small amount of training data is available.

Report

(4 results)

2014 Annual Research Report Final Research Report ( PDF )
2013 Research-status Report
2012 Research-status Report

Research Products

(22 results)

All 2014 2013 2012

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (19 results) (of which Invited: 1 results)

[Journal Article] Spectral modeling with contextual additive structure for HMM-based speech synthesis2014
- Author(s)
  Shinji Takaki, Yoshihiko Nankaku and Keiichi Tokuda
- Journal Title
  
  IEEE Transactions on Audio, Speech, and Language Processing
  
  Volume: Vol. 8, Issue 2 Issue: 2 Pages: 229-238
- DOI
  10.1109/jstsp.2014.2305919
- Related Report
  2014 Annual Research Report 2013 Research-status Report
- Peer Reviewed
[Journal Article] Integration of spectral feature extraction and modeling for HMM-based speech synthesis2014
- Author(s)
  Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda
- Journal Title
  
  IEICE TRANSACTIONS on Information & Systems
  
  Volume: vol.E97-D, no.6 Pages: 1438-1448
- NAID
  130004841776
- Related Report
  2014 Annual Research Report 2013 Research-status Report
- Peer Reviewed
[Journal Article] A Bayesian Framework Using Multiple Model Structures for Speech Recognition2013
- Author(s)
  Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda,
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E96.D Issue: 4 Pages: 939-948
- DOI
  10.1587/transinf.E96.D.939
- NAID
  10031182859
- ISSN
  0916-8532, 1745-1361
- Related Report
  2012 Research-status Report
- Peer Reviewed
[Presentation] ニューラルネットワークに基づく音声合成における生成モデルの利用の検討2014
- Author(s)
  橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  北海学園大学
- Year and Date
  2014-09-03 – 2014-09-05
- Related Report
  2014 Annual Research Report
[Presentation] 因子分析に基づくHMM音声合成における基底クラスタリングの検討2014
- Author(s)
  吉村建慶，橋本佳，南角吉彦，徳田恵一
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  北海学園大学
- Year and Date
  2014-09-03 – 2014-09-05
- Related Report
  2014 Annual Research Report
[Presentation] H/L型アクセント推定と音響モデリングを統合したHMM音声合成の検討2014
- Author(s)
  神谷翔大，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  北海学園大学
- Year and Date
  2014-09-03 – 2014-09-05
- Related Report
  2014 Annual Research Report
[Presentation] 統計的機械学習問題としての音声研究2014
- Author(s)
  南角吉彦
- Organizer
  電子情報通信学会　音声研究会
- Place of Presentation
  ホテル花巻
- Year and Date
  2014-07-24 – 2014-07-26
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] Integration of speaker and pitch adaptive training for HMM-based singing voice synthesis2014
- Author(s)
  Kanako Shirota, Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  2014 IEEE International Conference on Acoustics, Speech, and Signal Processing
- Place of Presentation
  Florence Italy
- Year and Date
  2014-05-06 – 2014-05-09
- Related Report
  2014 Annual Research Report
[Presentation] HMM-based singing voice synthesis and its application to Japanese and English2014
- Author(s)
  Kazuhiro Nakamura, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  2014 IEEE International Conference on Acoustics, Speech, and Signal Processing
- Place of Presentation
  Florence Italy
- Year and Date
  2014-05-06 – 2014-05-09
- Related Report
  2014 Annual Research Report
[Presentation] 表現語空間を用いた連結固有声法に基づくクロスリンガル話者適応の検討2014
- Author(s)
  佐藤雄介，中村和寛，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Related Report
  2013 Research-status Report
[Presentation] GMM事後確率に基づいた重み付き変換関数による声質変換の検討2014
- Author(s)
  鶴野高輝，橋本佳，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Related Report
  2013 Research-status Report
[Presentation] HMM音声合成におけるLSPに関連した特徴量表現の検討2014
- Author(s)
  有竹貴士，中村和寛，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Related Report
  2013 Research-status Report
[Presentation] 低周波数標本化音声データの高帯域成分復元を考慮したメルケプストラム分析の検討2014
- Author(s)
  中村和寛，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会,
- Place of Presentation
  日本大学（駿河台キャンパス）
- Related Report
  2013 Research-status Report
[Presentation] 状態レベルのコンテキストを用いたHMM音声合成の検討2014
- Author(s)
  大浦圭一郎，橋本佳，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学（駿河台キャンパス）
- Related Report
  2013 Research-status Report
[Presentation] Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis2013
- Author(s)
  Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku and Keiichi Tokuda
- Organizer
  ISCA Speech Synthesis Workshop(SSW8)
- Place of Presentation
  Barcelona, Spain
- Related Report
  2013 Research-status Report
[Presentation] Contextual partial additive structure for HMM-based speech synthesis2013
- Author(s)
  Shinji Takaki, Yoshihiko Nankaku and Keiichi Tokuda
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013)
- Place of Presentation
  Vancouver, Canada
- Related Report
  2013 Research-status Report
[Presentation] Integration of acoustic modeling and mel-cepstral analysis for HMM-based speech synthesis2013
- Author(s)
  Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013)
- Place of Presentation
  Vancouver, Canada
- Related Report
  2013 Research-status Report
[Presentation] HMM音声合成における因子分析を用いた発話適応学習の検討2013
- Author(s)
  桑子修一，高木信二，橋本佳，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  東京工科大学
- Related Report
  2012 Research-status Report
[Presentation] HMM音声合成のためのバイリンガルデータを用いた因子分析に基づくクロスリンガル話者適応2013
- Author(s)
  吉村建慶，橋本佳，大浦圭一郎，南角吉彦，徳田恵一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  東京工科大学
- Related Report
  2012 Research-status Report
[Presentation] Cross-lingual speaker adaptation for HMM-based speech synthesis using joint-eigenvoices with a space of perceptual characteristics2013
- Author(s)
  Viviane de Franca Oliveira, Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  東京工科大学
- Related Report
  2012 Research-status Report
[Presentation] A Bayesian approach to speaker recognition based on GMMs using multiple model structures2012
- Author(s)
  Takafumi Hattori, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  Interspeech 2012
- Place of Presentation
  Portland, USA
- Related Report
  2012 Research-status Report
[Presentation] Cross-lingual speaker adaptation for HMM-based speech synthesis based on perceptual characteristics and spaker interpolation2012
- Author(s)
  Viviane de Franca Oliveira, Sayaka Shiota, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  Interspeech 2012
- Place of Presentation
  Portland, USA
- Related Report
  2012 Research-status Report

Development of statistically consistent voice conversion techniques based on joint feature modeling

Principal Investigator

NANKAKU Yoshihiko 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)

¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)

Report

Research Products

[Journal Article] Spectral modeling with contextual additive structure for HMM-based speech synthesis2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Integration of spectral feature extraction and modeling for HMM-based speech synthesis2014

Author(s)

Journal Title

NAID

Related Report

[Journal Article] A Bayesian Framework Using Multiple Model Structures for Speech Recognition2013

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] ニューラルネットワークに基づく音声合成における生成モデルの利用の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 因子分析に基づくHMM音声合成における基底クラスタリングの検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] H/L型アクセント推定と音響モデリングを統合したHMM音声合成の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 統計的機械学習問題としての音声研究2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Integration of speaker and pitch adaptive training for HMM-based singing voice synthesis2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] HMM-based singing voice synthesis and its application to Japanese and English2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 表現語空間を用いた連結固有声法に基づくクロスリンガル話者適応の検討2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] GMM事後確率に基づいた重み付き変換関数による声質変換の検討2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] HMM音声合成におけるLSPに関連した特徴量表現の検討2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 低周波数標本化音声データの高帯域成分復元を考慮したメルケプストラム分析の検討2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 状態レベルのコンテキストを用いたHMM音声合成の検討2014

[Presentation] HMM音声合成における因子分析を用いた発話適応学習の検討2013