2011 Fiscal Year Final Research Report

Research on robust spoken language interfaces for diverse voice variability and expressivity

Research Project

Project/Area Number	21300063
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Tokyo Institute of Technology
Principal Investigator	KOBAYASHI Takao 東京工業大学, 大学院・総合理工学研究科, 教授 (70153616)
Co-Investigator(Renkei-kenkyūsha)	NAGAHASHI Hiroshi 東京工業大学, 像情報工学研究所, 教授 (20143084)
Research Collaborator	NOSE Takashi 東京工業大学, 大学院・総合理工学研究科, 助教 (90550591)
Project Period (FY)	2009 – 2011
Keywords	音声情報処理
Research Abstract	The purpose of the research is to develop techniques that make the human-computer interaction using speech input/output more robust for variations of users' emotional states, speaking styles, preferences, and expressivity. We have proposed techniques using a quantized fundamental frequency prosodic context for robust speech synthesis and an extended context set for spontaneous conversational speech synthesis. We have also proposed techniques for robust speech recognition including extraction of paralinguistic information and rapid model adaptation.

Research Products
(20 results)

All 2012 2011 2010 2009 Other

All Journal Article (10 results) (of which Peer Reviewed: 10 results) Presentation (9 results) Remarks (1 results)

[Journal Article] HMMに基づく対話音声合成における多様な韻律生成のためのコンテクストの拡張2012
- Author(s)
  郡山知樹, 能勢隆, 小林隆夫
- Journal Title
  
  電子情報通信学会論文誌
  
  Volume: Vol.J95-D Pages: 597-607
- Peer Reviewed
[Journal Article] Very lowbit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized Fosymbols2012
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  Speech Communication
  
  Volume: Vol.54 Pages: 384-392
- Peer Reviewed
[Journal Article] A tone-modeling technique using a quantized F0 context to improvetone correctness in average-voice-based speech synthesis2012
- Author(s)
  Vataya Chunwijitra, Takashi Nose, TakaoKobayashi
- Journal Title
  
  SpeechCommunication
  
  Volume: vol.54 Pages: 245-255
- Peer Reviewed
[Journal Article] Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency2011
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  Speech Communication
  
  Volume: vol.53 Pages: 973-985
- Peer Reviewed
[Journal Article] HMM-based voice conversion using quantized F0 context2010
- Author(s)
  Takashi Nose, Yuhei Ota, Takao Kobayashi
- Journal Title
  
  IEICE Trans. onInformation and Systems
  
  Volume: Vol.E93-D Pages: 2483-2490
- Peer Reviewed
[Journal Article] Evaluation of prosodic contextual factors for HMM-based speechsynthesis2010
- Author(s)
  Shuji Yokomizo, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proc. 11th AnnualConference of the International SpeechCommunication Association
  
  Pages: 430-433
- Peer Reviewed
[Journal Article] Conversational spontaneousspeech synthesis using average voice model2010
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proc. 11th Annual Conference of the International Speech Communication Association
  
  Pages: 853-856
- Peer Reviewed
[Journal Article] HMM-based speech synthesis with unsupervised labeling of accentualcontext based on F0 quantization and average voice model2010
- Author(s)
  Takashi Nose, Koujirou Ooki, Takao Kobayashi
- Journal Title
  
  Proc. 2010IEEE International Conference onAcoustics, Speech, and Signal Processing
  
  Pages: 4622-4625
- Peer Reviewed
[Journal Article] A rapidmodel adaptation technique for emotionalspeech recognition with style estimationbased on multiple-regression HMM2010
- Author(s)
  Yusuke Ijima, Takashi Nose, Makoto Tachibana, Takao Kobayashi
- Journal Title
  
  IEICE Trans. on Information andSystems
  
  Volume: Vol.E93-D Pages: 107-115
- Peer Reviewed
[Journal Article] Atechnique for estimating intensity of emotional expressions and speaking styles in speech based on multiple-regression HSMM2010
- Author(s)
  Takashi Nose, Takao Kobayashi
- Journal Title
  
  IEICE Trans. on Information and Systems
  
  Volume: Vol.E93-D Pages: 116-124
- Peer Reviewed
[Presentation] An F0 modeling technique based on prosodic events for spontaneous speech synthesis2012
- Author(s)
  Tomoki Koriyama
- Organizer
  2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
- Place of Presentation
  Kyoto, Japan
- Year and Date
  2012-03-29
[Presentation] HMM音声合成における不特定話者スタイル変換の検討2011
- Author(s)
  金川裕紀
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  芝浦工業大学,東京都江東区
- Year and Date
  2011-12-20
[Presentation] On the use of extended context for HMM-based spontaneous conversational speech synthesis2011
- Author(s)
  Tomoki Koriyama
- Organizer
  12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011
- Place of Presentation
  Florence, Italy
- Year and Date
  2011-08-30
[Presentation] A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM2011
- Author(s)
  Takashi Nose
- Organizer
  12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011
- Place of Presentation
  Florence, Italy
- Year and Date
  2011-08-28
[Presentation] Very low bit-rate F0 coding for phonetic vocoder using MSD-HMM with quantized F0 context2011
- Author(s)
  Takashi Nose
- Organizer
  2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP2011
- Place of Presentation
  Prague, Czech Republic
- Year and Date
  2011-05-26
[Presentation] Tonal context labeling using quantized F0 symbols for improvingtone correctness in average-voice-basedspeech synthesis2011
- Author(s)
  Vataya Chunwijitra
- Organizer
  2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
- Place of Presentation
  Prague, Czech Republic
- Year and Date
  2011-05-24
[Presentation] 動画像からの顔の姿勢推定による非言語情報の取得2010
- Author(s)
  宮崎悠樹
- Organizer
  画像電子学会第250回研究会
- Place of Presentation
  崇城大学,熊本市
- Year and Date
  2010-03-23
[Presentation] HMM-based speaker characteristics emphasis using average voice model2009
- Author(s)
  Takashi Nose, HMM-based speakercharacteristics emphasis using averagevoice model
- Organizer
  10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009
- Place of Presentation
  Brighton, UK.
- Year and Date
  2009-09-10
[Presentation] Speaking style adaptation for spontaneous speech recognition using multiple-regression HMM2009
- Author(s)
  Yusuke Ijima
- Organizer
  10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009
- Place of Presentation
  Brighton, UK.
- Year and Date
  2009-09-07
[Remarks]
- URL
  http://www.kbys.ip.titech.ac.jp/

2011 Fiscal Year Final Research Report

Research on robust spoken language interfaces for diverse voice variability and expressivity

Principal Investigator

KOBAYASHI Takao 東京工業大学, 大学院・総合理工学研究科, 教授 (70153616)

Research Products

[Journal Article] HMMに基づく対話音声合成における多様な韻律生成のためのコンテクストの拡張2012

Author(s)

Journal Title

[Journal Article] Very lowbit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized Fosymbols2012

Author(s)

Journal Title

[Journal Article] A tone-modeling technique using a quantized F0 context to improvetone correctness in average-voice-based speech synthesis2012

Author(s)

Journal Title

[Journal Article] Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency2011

Author(s)

Journal Title

[Journal Article] HMM-based voice conversion using quantized F0 context2010

Author(s)

Journal Title

[Journal Article] Evaluation of prosodic contextual factors for HMM-based speechsynthesis2010

Author(s)

Journal Title

[Journal Article] Conversational spontaneousspeech synthesis using average voice model2010

Author(s)

Journal Title

[Journal Article] HMM-based speech synthesis with unsupervised labeling of accentualcontext based on F0 quantization and average voice model2010

Author(s)

Journal Title

[Journal Article] A rapidmodel adaptation technique for emotionalspeech recognition with style estimationbased on multiple-regression HMM2010

Author(s)

Journal Title

[Journal Article] Atechnique for estimating intensity of emotional expressions and speaking styles in speech based on multiple-regression HSMM2010

Author(s)

Journal Title

[Presentation] An F0 modeling technique based on prosodic events for spontaneous speech synthesis2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] HMM音声合成における不特定話者スタイル変換の検討2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] On the use of extended context for HMM-based spontaneous conversational speech synthesis2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Very low bit-rate F0 coding for phonetic vocoder using MSD-HMM with quantized F0 context2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Tonal context labeling using quantized F0 symbols for improvingtone correctness in average-voice-basedspeech synthesis2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 動画像からの顔の姿勢推定による非言語情報の取得2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] HMM-based speaker characteristics emphasis using average voice model2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Speaking style adaptation for spontaneous speech recognition using multiple-regression HMM2009

Author(s)

Organizer

Place of Presentation

Year and Date