2013 Fiscal Year Final Research Report

A study on speech diversification techniques based on corpus design for advanced humanoid speech synthesis

Research Project

Project/Area Number	23700195
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perception information processing/Intelligent robotics
Research Institution	Tohoku University (2013) Tokyo Institute of Technology (2011-2012)
Principal Investigator	NOSE Takashi 東北大学, 工学(系)研究科(研究院), 講師 (90550591)
Project Period (FY)	2011 – 2012
Keywords	音声合成 / 隠れマルコフモデル / 統計的音声合成 / 感情音声合成 / ヒューマノイドロボット / 音声コーパス
Research Abstract	Our goal in this research is to realize more human-like, natural text-to-speech system with various emotional expressions and speaking styles, and the achievements of our studies are as follows: (1)We proposed a novel corpus-design technique in which accent, style, and sentence-final expression are taken into account. (2)We incorporated user's subjective emotional intensities into acoustic model training to improve the performance of expressive speech synthesis. (3)We proposed an automatic labeling technique of emphasis expression using a parameter generation technique of fundamental frequency to realize emphatic speech synthesis. (4)We proposed cross-lingual speech synthesis using only a target speaker's native language speech samples to synthesis multi-lingual speech at a low cost.

Research Products
(34 results)

All 2014 2013 2012 2011

All Journal Article (20 results) (of which Peer Reviewed: 20 results) Presentation (14 results)

[Journal Article] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis2014
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Journal Title
  
  Speech Communication
  
  Volume: Vol.57 Pages: 144-154
- DOI
  10.1016/j.specom.2013.09.014
- Peer Reviewed
[Journal Article] Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis2013
- Author(s)
  Tomohiro Nagata, Hiroki Mori, Takashi Nose
- Journal Title
  
  Proceedings of 14th Annual Conference of the International Speech Communication Association (ISCA)
  
  Pages: 1549-1553
- Peer Reviewed
[Journal Article] Statistical nonparametric speech synthesis using sparse Gaussian processes2013
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 14th Annual Conference of the International Speech Communication Association (ISCA)
  
  Pages: 1072-1076
- Peer Reviewed
[Journal Article] A style control technique for singing voice synthesis based on multiple-regression HSMM2013
- Author(s)
  Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proceedings of 14th Annual Conference of the International Speech Communication Association (ISCA)
  
  Pages: 378-382
- Peer Reviewed
[Journal Article] Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis2013
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing
  
  Pages: 8007-8011
- Peer Reviewed
[Journal Article] Speaker-independent style conversion for HMM-based expressive speech synthesis2013
- Author(s)
  Hiroki Kanagawa, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing
  
  Pages: 7864-7868
- Peer Reviewed
[Journal Article] HMM-based expressive speech synthesis based on phrase-level F0 context labeling2013
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Journal Title
  
  Proceedings of 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing
  
  Pages: 7859-7863
- Peer Reviewed
[Journal Article] An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model2013
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  Speech Communication
  
  Volume: Vol.55, No.2 Pages: 347-357
- DOI
  10.1016/j.specom.2012.09.003
- Peer Reviewed
[Journal Article] A speech parameter generation algorithm using local variance for HMM-based speech synthesis2012
- Author(s)
  Vataya Chunwijitra, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 13th Annual Conference of the International Speech Communication Association (ISCA)
  
  Pages: 1151-1154
- Peer Reviewed
[Journal Article] Discontinuous observation HMM for prosodic-event-based F0 generation2012
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 13th Annual Conference of the International Speech Communication Association (ISCA)
  
  Pages: 462-465
- Peer Reviewed
[Journal Article] An F0 modeling technique based on prosodic events for spontaneous speech synthesis2012
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012)
  
  Pages: 4589-4593
- Peer Reviewed
[Journal Article] HMM に基づく対話音声合成における多様な韻律生成のためのコンテクストの拡張2012
- Author(s)
  郡山知樹, 能勢隆, 小林隆夫
- Journal Title
  
  電子情報通信学会論文誌
  
  Volume: Vol.J95-D, No.3 Pages: 597-607
- Peer Reviewed
[Journal Article] Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols2012
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  Speech Communication
  
  Pages: 384-392
- DOI
  10.1016/j.specom.2011.10.002
- Peer Reviewed
[Journal Article] A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis2012
- Author(s)
  Vataya Chunwijitra, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Speech Communication
  
  Volume: Vol.54, No.2 Pages: 245-255
- DOI
  10.1016/j.specom.2011.08.006
- Peer Reviewed
[Journal Article] Recent development of HMM-based expressive speech synthesis and its applications2011
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 2011 Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference
- URL
  http://www.apsipa.org/proceedings_2011/pdf/APSIPA189.pdf
- Peer Reviewed
[Journal Article] Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency2011
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  Speech Communication
  
  Volume: Vol.53, No.7 Pages: 973-985
- DOI
  10.1016/j.specom.2011.05.001
- Peer Reviewed
[Journal Article] On the use of extended context for HMM-based spontaneous conversational speech synthesis2011
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 12th Annual Conference of the International Speech Communication Association (ISCA) (INTERSPEECH 2011)
  
  Pages: 2657-2660
- Peer Reviewed
[Journal Article] Performance prediction of speech recognition using average-voice-based speech synthesis2011
- Author(s)
  Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii
- Journal Title
  
  Proceedings of 12th Annual Conference of the International Speech Communication Association (ISCA) (INTERSPEECH 2011)
  
  Pages: 1953-1956
- Peer Reviewed
[Journal Article] HMM-based emphatic speech synthesis using unsupervised context labeling2011
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Journal Title
  
  Proceedings of 12th Annual Conference of the International Speech Communication Association (ISCA) (INTERSPEECH 2011)
  
  Pages: 1849-185
- Peer Reviewed
[Journal Article] A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM2011
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 12th Annual Conference of the International Speech Communication Association (ISCA) (INTERSPEECH 2011)
  
  Pages: 109-112
- Peer Reviewed
[Presentation] Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis2013
- Author(s)
  Tomohiro Nagata, Hiroki Mori, Takashi Nose
- Organizer
  INTERSPEECH 2013
- Place of Presentation
  Lyon, France
- Year and Date
  2013-08-27
[Presentation] Statistical nonparametric speech synthesis using sparse Gaussian processes2013
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Organizer
  INTERSPEECH 2013
- Place of Presentation
  Lyon, France
- Year and Date
  2013-08-27
[Presentation] A style control technique for singing voice synthesis based on multiple-regression HSMM2013
- Author(s)
  Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi
- Organizer
  INTERSPEECH 2013
- Place of Presentation
  Lyon, France
- Year and Date
  2013-08-26
[Presentation] Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis2013
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Organizer
  ICASSP 2013
- Place of Presentation
  Vancouver, Canada
- Year and Date
  2013-05-31
[Presentation] Speaker-independent style conversion for HMM-based expressive speech synthesis2013
- Author(s)
  Hiroki Kanagawa, Takashi Nose, Takao Kobayashi
- Organizer
  ICASSP 2013
- Place of Presentation
  Vancouver, Canada
- Year and Date
  2013-05-31
[Presentation] HMM-based expressive speech synthesis based on phrase-level F0 context labeling2013
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Organizer
  ICASSP 2013
- Place of Presentation
  Vancouver, Canada
- Year and Date
  2013-05-31
[Presentation] A speech parameter generation algorithm using local variance for HMM-based speech synthesis2012
- Author(s)
  Vataya Chunwijitra, Takashi Nose, Takao Kobayashi
- Organizer
  INTERSPEECH 2012
- Place of Presentation
  Portland, USA
- Year and Date
  2012-09-11
[Presentation] Discontinuous observation HMM for prosodic-event-based F0 generation2012
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Organizer
  INTERSPEECH 2012
- Place of Presentation
  Portland, USA
- Year and Date
  2012-09-10
[Presentation] An F0 modeling technique based on prosodic events for spontaneous speech synthesis2012
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Organizer
  ICASSP 2012
- Place of Presentation
  Kyoto, Japan
- Year and Date
  2012-03-29
[Presentation] Recent development of HMM-based expressive speech synthesis and its applications2011
- Author(s)
  Takashi Nose, Takao Kobayashi
- Organizer
  APSIPA ASC 2011
- Place of Presentation
  Xi'an, China
- Year and Date
  2011-10-19
[Presentation] On the use of extended context for HMM-based spontaneous conversational speech synthesis2011
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Organizer
  INTERSPEECH 2011
- Place of Presentation
  Florence, Italy
- Year and Date
  2011-08-30
[Presentation] Performance prediction of speech recognition using average-voice-based speech synthesis2011
- Author(s)
  Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii
- Organizer
  INTERSPEECH 2011
- Place of Presentation
  Florence, Italy
- Year and Date
  2011-08-29
[Presentation] HMM-based emphatic speech synthesis using unsupervised context labeling2011
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Organizer
  INTERSPEECH 2011
- Place of Presentation
  Florence, Italy
- Year and Date
  2011-08-29
[Presentation] A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM2011
- Author(s)
  Takashi Nose, Takao Kobayashi
- Organizer
  INTERSPEECH 2011
- Place of Presentation
  Florence, Italy
- Year and Date
  2011-08-28

2013 Fiscal Year Final Research Report

A study on speech diversification techniques based on corpus design for advanced humanoid speech synthesis

Principal Investigator

NOSE Takashi 東北大学, 工学(系)研究科(研究院), 講師 (90550591)

Research Products

[Journal Article] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis2014

Author(s)

Journal Title

DOI

[Journal Article] Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis2013

Author(s)

Journal Title

[Journal Article] Statistical nonparametric speech synthesis using sparse Gaussian processes2013

Author(s)

Journal Title

[Journal Article] A style control technique for singing voice synthesis based on multiple-regression HSMM2013

Author(s)

Journal Title

[Journal Article] Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis2013

Author(s)

Journal Title

[Journal Article] Speaker-independent style conversion for HMM-based expressive speech synthesis2013

Author(s)

Journal Title

[Journal Article] HMM-based expressive speech synthesis based on phrase-level F0 context labeling2013

Author(s)

Journal Title

[Journal Article] An intuitive style control technique in HMM-based expressive speech synthesis using subjective style intensity and multiple-regression global variance model2013

Author(s)

Journal Title

DOI

[Journal Article] A speech parameter generation algorithm using local variance for HMM-based speech synthesis2012

Author(s)

Journal Title

[Journal Article] Discontinuous observation HMM for prosodic-event-based F0 generation2012

Author(s)

Journal Title

[Journal Article] An F0 modeling technique based on prosodic events for spontaneous speech synthesis2012

Author(s)

Journal Title

[Journal Article] HMM に基づく対話音声合成における多様な韻律生成のためのコンテクストの拡張2012

Author(s)

Journal Title

[Journal Article] Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols2012

Author(s)

Journal Title

DOI

[Journal Article] A tone-modeling technique using a quantized F0 context to improve tone correctness in average-voice-based speech synthesis2012

Author(s)

Journal Title

DOI

[Journal Article] Recent development of HMM-based expressive speech synthesis and its applications2011

Author(s)

Journal Title

URL

[Journal Article] Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency2011

Author(s)

Journal Title

DOI

[Journal Article] On the use of extended context for HMM-based spontaneous conversational speech synthesis2011

Author(s)

Journal Title

[Journal Article] Performance prediction of speech recognition using average-voice-based speech synthesis2011

Author(s)

Journal Title

[Journal Article] HMM-based emphatic speech synthesis using unsupervised context labeling2011

Author(s)

Journal Title

[Journal Article] A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM2011

Author(s)

Journal Title

[Presentation] Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Statistical nonparametric speech synthesis using sparse Gaussian processes2013

Author(s)

Organizer

Place of Presentation