Expressive Multi-language Speech Synthesis Based on the Generation Process Model and Its Use for Automatic Speech Translation

Research Project

Project/Area Number	21300061
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	The University of Tokyo
Principal Investigator	HIROSE Keikichi 東京大学, 大学院・情報理工学系研究科, 教授 (50111472)
Co-Investigator(Kenkyū-buntansha)	峯松信明東京大学, 情報理工学系研究科, 准教授 (90273333)
Co-Investigator(Renkei-kenkyūsha)	河合剛北海道大学, メディア・コミュニケーション研究院, 准教授 (70312981)
Project Period (FY)	2009 – 2011
Project Status	Completed (Fiscal Year 2011)
Budget Amount *help	¥17,550,000 (Direct Cost: ¥13,500,000、Indirect Cost: ¥4,050,000) Fiscal Year 2011: ¥5,980,000 (Direct Cost: ¥4,600,000、Indirect Cost: ¥1,380,000) Fiscal Year 2010: ¥5,330,000 (Direct Cost: ¥4,100,000、Indirect Cost: ¥1,230,000) Fiscal Year 2009: ¥6,240,000 (Direct Cost: ¥4,800,000、Indirect Cost: ¥1,440,000)
Keywords	生成過程モデル / 基本周波数パターン / コーパスベース韻律制御 / 音声自動翻訳 / 談話焦点 / HMM音声合成 / 声質と調子 / 音声モーフィング / 発話スタイル / 声調核 / 多言語 / 音素長 / 発話焦点
Research Abstract	A unified study on prosody control for multi-languages was conductedbased on the generation process model of fundamental frequency contours(F_0 model). We developeda method of prosody adaptation, where differences in F_0 model commands were learned from parallelspeech corpus and were applied to baseline speech. Focus control, style conversion and voiceconversion were realized. Furthermore, by approximating F_0 contours of training speech corpusand/or generated F_0 contours using the F_0 model, we improved the quality of synthetic speech by theHMM-based speech synthesis. Also, we added focus control. Based on the above results, experiments were conducted on conveying discourse information and intentions in speech Translation.

Report

(4 results)

2011 Annual Research Report Final Research Report ( PDF )
2010 Annual Research Report
2009 Annual Research Report

Research Products
(33 results)

All 2012 2011 2010 2009 Other

All Journal Article (13 results) (of which Peer Reviewed: 13 results) Presentation (16 results) Book (3 results) Remarks (1 results)

[Journal Article] A method for generation of Mandarin F_0 contours based on tone nucleus model and superpositional model2012
- Author(s)
  Qinghua Sun, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Speech Communication
  
  Volume: Vol.54, Issue 8 Pages: 932-945
- URL
  http://www.sciencedirect.com/science/journal/01676393
- Related Report
  2011 Final Research Report
- Peer Reviewed
[Journal Article] HMM-based F_0 contour synthesis using the generation process model2012
- Author(s)
  Tatsuya Matsuda, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Acoustical Science and Technology, Acoustical Society of Japan
- NAID
  110007969995
- URL
  https://www.jstage.jst.go.jp/browse/ast/33/0/_contents,http://journals.acoustics.jp/ast-archive/
- Related Report
  2011 Final Research Report
- Peer Reviewed
[Journal Article] Improvement of prosody in HMM-based speech synthesis using generation process model2011
- Author(s)
  Miaomiao Wang, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Journal of Signal Processing
  
  Volume: vol.15, no.4 Pages: 279-282
- URL
  http://www.risp.jp/Product.html
- Related Report
  2011 Final Research Report
- Peer Reviewed
[Journal Article] Improvement of prosody in HMM-based speech synthesis using generation process model2011
- Author(s)
  Miaomiao Wang
- Journal Title
  
  Journal of Research Institute of Signal Processing
  
  Volume: 15 Pages: 279-282
- Related Report
  2011 Annual Research Report
- Peer Reviewed
[Journal Article] Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency2011
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Proceedings of INTERSPEECH
  
  Volume: 1 Pages: 2793-2796
- Related Report
  2011 Annual Research Report
- Peer Reviewed
[Journal Article] HMM-based F_0 contour synthesis using the generation process model2011
- Author(s)
  Tatsuya Matsuda
- Journal Title
  
  Acoustical Science and Technology, Acoustical Society of Japan
  
  Volume: (印刷中)(掲載確定)
- Related Report
  2011 Annual Research Report
- Peer Reviewed
[Journal Article] Improvement of prosody in HMM-based speech synthesis using generation process model2011
- Author(s)
  Miaomiao Wang
- Journal Title
  
  Journal of Signal Processing
  
  Volume: 15(7月号掲載)
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] HMM-based synthesis of fundamental frequency contours using the generation process model2010
- Author(s)
  Tetsuya Matsuda, Keikichi Hirose and Nobuaki Minematsu
- Journal Title
  
  Journal of Signal Processing
  
  Volume: vol.14, no.4 Pages: 277-280
- URL
  http://www.risp.jp/Product.html
- Related Report
  2011 Final Research Report
- Peer Reviewed
[Journal Article] MM-based synthesis of fundamental frequency contours using the generation process model2010
- Author(s)
  Tetsuya Matsuda
- Journal Title
  
  Journal of Signal Processing
  
  Volume: 14 Pages: 277-280
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] Improving Mandarin segmental duration prediction with automatically extracted syntax features2010
- Author(s)
  Miaomiao Wen
- Journal Title
  
  Proceedings of INTERSPEECH
  
  Volume: 1 Pages: 2178-2181
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] HMM-Based synthesis of fundamental frequency contours using the generation process model2010
- Author(s)
  Tetsuya Matsuda
- Journal Title
  
  Proceedings of International Workshop on Nonlinear Circuits and Signal Processing 1
  
  Pages: 464-467
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] Generation of fundamental frequency in HMM-based TTS using generation process model2010
- Author(s)
  Miaomiao Wang
- Journal Title
  
  Proceedings of International Conference on Speech Prosody 1(印刷中,掲載確定)
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model2009
- Author(s)
  Keiko Ochi
- Journal Title
  
  Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 1
  
  Pages: 4485-4488
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Presentation] Fundamental frequency contour generation process model for improved and flexible control of prosodic features in hmm-based speech synthesis2012
- Author(s)
  Keikichi Hirose
- Organizer
  International Symposium on Frontiers of Research on Speech and Music
- Place of Presentation
  KIIT, Gurgaon, India(招待講演)
- Year and Date
  2012-01-19
- Related Report
  2011 Annual Research Report
[Presentation] Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based speech synthesis2012
- Author(s)
  Hiroya Hashimoto, Keikichi Hirose, and Nobuaki Minematsu
- Organizer
  Proceedings INTERSPEECH
- Place of Presentation
  Portland
- Related Report
  2011 Final Research Report
[Presentation] Emotional voice conversion for mandarin using tone nucleus model-small corpus and high efficiency2012
- Author(s)
  Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, and Nobuaki Minematsu
- Organizer
  Proceedings of International Conference on Speech Prosody
- Place of Presentation
  Shanghai
- Related Report
  2011 Final Research Report
[Presentation] Fundamental frequency contour reshaping in HMM-based speech synthesis and realization of prosodic focus using generation process model2012
- Author(s)
  Keikichi Hirose, Hiroya Hashimoto, Jun Ikeshima, and Nobuaki Minematsu
- Organizer
  Proceedings of International Conference on Speech Prosody
- Place of Presentation
  Shanghai
- Related Report
  2011 Final Research Report
[Presentation] Fundamental frequency contour generation process model for improved and flexible control of prosodic features in hmm-based speech synthesis2012
- Author(s)
  Keikichi Hirose
- Organizer
  Proceedings of International Symposium on Frontiers of Research on Speech and Music
- Place of Presentation
  Gurgaon
- Related Report
  2011 Final Research Report
[Presentation] Representing fundamental frequency contours generated by hmm-based speech synthesis using generation process model2011
- Author(s)
  Keikichi Hirose, Tatsuya Matsuda, Hiroya Hashimoto, and Nobuaki Minematsu
- Organizer
  Proceedings of IEEE International Workshop on Machine Learning for Signal Processing
- Place of Presentation
  Beijing
- Related Report
  2011 Final Research Report
[Presentation] Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency2011
- Author(s)
  Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, and Nobuaki Minematsu
- Organizer
  Proceedings INTERSPEECH
- Place of Presentation
  Florence
- Related Report
  2011 Final Research Report
[Presentation] Prosody conversion for emotional Mandarin speech synthesis using the tone nucleus model2011
- Author(s)
  Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, and Nobuaki Minematsu
- Organizer
  Proceedings INTERSPEECH
- Place of Presentation
  Florence
- Related Report
  2011 Final Research Report
[Presentation] Control of prosodic features in corpus-based generation of fundamental frequency contours based on the generation process model2010
- Author(s)
  Keikichi Hirose
- Organizer
  IEEE International Conference on Signal Processing
- Place of Presentation
  Taiyangdao Hotel, Beijing, Chin
- Year and Date
  2010-10-27
- Related Report
  2010 Annual Research Report
[Presentation] Analysis and Synthesis of F_0 Contours for Bangla Readout Speech2010
- Author(s)
  Shyamal Das Mandal, Anal Haque Warsi, Tulika Basu, Keikichi Hirose, and Hiroya Fujisaki
- Organizer
  Proceedings Oriental COCOSDA
- Place of Presentation
  Kathmandu
- Related Report
  2011 Final Research Report
[Presentation] Control of prosodic features in corpus-based generation of fundamental frequency contours based on the generation process model2010
- Author(s)
  Keikichi Hirose, Keiko Ochi, and Nobuaki Minematsu
- Organizer
  Proceedings IEEE International Conference on Signal Processing
- Place of Presentation
  Beijing
- Related Report
  2011 Final Research Report
[Presentation] Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model2010
- Author(s)
  Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, and Nobuaki Minematsu
- Organizer
  Proceedings INTERSPEECH
- Place of Presentation
  Makuhari
- Related Report
  2011 Final Research Report
[Presentation] Using F_0 contour generation process model for improved and flexible control of prosodic features in HMM-based speech synthesis2010
- Author(s)
  Keikichi Hirose, Keiko Ochi, Miaomiao Wang, Tatsuya Matsuda, Miaomiao Wen, and Nobuaki Minematsu
- Organizer
  Proceedings of 21^<st> Conference on Electronic Speech Signal Processing
- Place of Presentation
  Berlin
- Related Report
  2011 Final Research Report
[Presentation] Generation of fundamental frequency in HMM-based TTS using generation process model2010
- Author(s)
  Miaomiao Wang, Keikichi Hirose, and Nobuaki Minematsu
- Organizer
  Proceedings of International Conference on Speech Prosody
- Place of Presentation
  Chicago
- Related Report
  2011 Final Research Report
[Presentation] Control of prosodic features based on the super-positional representation of F_0 contours -toward flexible control of prosodic features in speech synthesis-2009
- Author(s)
  Keikichi Hirose
- Organizer
  International Workshop on Spoken Language Prosody
- Place of Presentation
  C-DAC, Kolkata, India
- Year and Date
  2009-11-25
- Related Report
  2009 Annual Research Report
[Presentation] Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model2009
- Author(s)
  Keiko Ochi, Keikichi Hirose, and Nobuaki Minematsu
- Organizer
  Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing
- Place of Presentation
  Taipei
- Related Report
  2011 Final Research Report
[Book] Prosodic corpora based on F_0 contour generation model and automatic extraction of model parameters2010
- Author(s)
  Keikichi Hirose
- Publisher
  Computer Processing of Asian Spoken languages
- Related Report
  2011 Final Research Report
[Book] Speech prosody corpora based on F_0 contour generation model and automatic extraction of model parameters, in Computer Processing of Asian Spoken languages, edited by Shuichi Itahashi et.al.2010
- Author(s)
  Keikichi Hirose
- Total Pages
  372
- Publisher
  Consideration Books, Los Angeles
- Related Report
  2009 Annual Research Report
[Book] On the prosodic features for emotional speech2009
- Author(s)
  Keikichi Hirose, Qinghua Sun
- Publisher
  Frontiers in Phonetics and Speech Science
- Related Report
  2011 Final Research Report
[Remarks] (研究業績)
- URL
  http://www.gavo.t.u-tokyo.ac.jp/~hirose/cv/curriculumvitae.pdf
- Related Report
  2011 Final Research Report

Expressive Multi-language Speech Synthesis Based on the Generation Process Model and Its Use for Automatic Speech Translation

Principal Investigator

HIROSE Keikichi 東京大学, 大学院・情報理工学系研究科, 教授 (50111472)

¥17,550,000 (Direct Cost: ¥13,500,000、Indirect Cost: ¥4,050,000)

Report

Research Products

[Journal Article] A method for generation of Mandarin F_0 contours based on tone nucleus model and superpositional model2012

Author(s)

Journal Title

URL

Related Report

[Journal Article] HMM-based F_0 contour synthesis using the generation process model2012

Author(s)

Journal Title

NAID

URL

Related Report

[Journal Article] Improvement of prosody in HMM-based speech synthesis using generation process model2011

Author(s)

Journal Title

URL

Related Report

[Journal Article] Improvement of prosody in HMM-based speech synthesis using generation process model2011

Author(s)

Journal Title

Related Report

[Journal Article] Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency2011

Author(s)

Journal Title

Related Report

[Journal Article] HMM-based F_0 contour synthesis using the generation process model2011

Author(s)

Journal Title

Related Report

[Journal Article] Improvement of prosody in HMM-based speech synthesis using generation process model2011

Author(s)

Journal Title

Related Report

[Journal Article] HMM-based synthesis of fundamental frequency contours using the generation process model2010

Author(s)

Journal Title

URL

Related Report

[Journal Article] MM-based synthesis of fundamental frequency contours using the generation process model2010

Author(s)

Journal Title

Related Report

[Journal Article] Improving Mandarin segmental duration prediction with automatically extracted syntax features2010

Author(s)

Journal Title

Related Report

[Journal Article] HMM-Based synthesis of fundamental frequency contours using the generation process model2010

Author(s)

Journal Title

Related Report

[Journal Article] Generation of fundamental frequency in HMM-based TTS using generation process model2010

Author(s)

Journal Title

Related Report

[Journal Article] Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model2009

Author(s)

Journal Title

Related Report

[Presentation] Fundamental frequency contour generation process model for improved and flexible control of prosodic features in hmm-based speech synthesis2012

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based speech synthesis2012

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Emotional voice conversion for mandarin using tone nucleus model-small corpus and high efficiency2012

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Fundamental frequency contour reshaping in HMM-based speech synthesis and realization of prosodic focus using generation process model2012