Interactive audiobook using statistical parametric speech synthesis and collective intelligence

Research Project

Project/Area Number	15K12071
Research Category	Grant-in-Aid for Challenging Exploratory Research
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	National Institute of Informatics
Principal Investigator	Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 准教授 (70709352)
Co-Investigator(Renkei-kenkyūsha)	TAKAKI Shinji 国立情報学研究所, コンテンツ科学研究系, 特任助教 (50735090)
Project Period (FY)	2015-04-01 – 2017-03-31
Project Status	Completed (Fiscal Year 2016)
Budget Amount *help	¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000) Fiscal Year 2016: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2015: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords	音声合成 / オーディオブック / 集合知 / 機械学習 / インタラクティブ / ディープラーニング / 音声情報処理
Outline of Final Research Achievements	Nowadays e-book readers have speech synthesis functions and users can enjoy not only reading but also listening to the e-books. If statistical parametric speech synthesis, which can flexibly generate various voice types of synthetic speech in various speaking styles, is combined with the e-book readers, e-books may become a future platform where the users can operate the controls of expression of synthetic speech interactively. For this purpose, we have advanced acoustic modeling techniques by means of factorizations of speech transformation functions. More specifically, we explicitly factorized speaker and emotional transformations and proposed a new adaptation algorithm to transplant emotional transformations estimated from a speaker into another speaker. We also constructed a new system where speaker’s gender and age are factorized. A prototype e-book reader based on proposed speech synthesis techniques was also built for demonstrating the new ideas.

Report

(3 results)

2016 Annual Research Report Final Research Report ( PDF )
2015 Research-status Report

Research Products
(11 results)

All 2017 2016 2015 Other

All Int'l Joint Research (4 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results) Presentation (6 results) (of which Int'l Joint Research: 6 results)

[Int'l Joint Research] マドリッド工科大学(スペイン)
- Related Report
  2016 Annual Research Report
[Int'l Joint Research] アルト大学(フィンランド)
- Related Report
  2016 Annual Research Report
[Int'l Joint Research] Technical University of Madrid(スペイン)
- Related Report
  2015 Research-status Report
[Int'l Joint Research] University of Edinburgh(英国)
- Related Report
  2015 Research-status Report
[Journal Article] Emotion transplantation through adaptation in HMM-based speech synthesis2015
- Author(s)
  Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Rubén San-Segundo, Javier Ferreiros, Junichi Yamagishi, Juan M. Montero
- Journal Title
  
  Computer Speech & Language
  
  Volume: 34 (1) Issue: 1 Pages: 292-307
- DOI
  10.1016/j.csl.2015.03.008
- Related Report
  2015 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Presentation] DAPTING AND CONTROLLING DNN-BASED SPEECH SYNTHESIS USING INPUT CODES2017
- Author(s)
  Hieu-Thi Luong, Shinji Takaki, Gustav Eje Henter, Junichi Yamagishi
- Organizer
  The 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017)
- Place of Presentation
  HILTON NEW ORLEANS RIVERSIDE (New Orleans, USA)
- Year and Date
  2017-03-05
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM2016
- Author(s)
  Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Ascension Gallardo-Antolin, Junichi Yamagishi, Juan M. Montero
- Organizer
  The 26th International Conference on Computational Linguistics (COLING 2016)
- Place of Presentation
  Osaka, Japan
- Year and Date
  2016-12-13
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis2016
- Author(s)
  Shinji Takaki, SangJin Kim, Junichi Yamagishi
- Organizer
  The 9th ISCA Workshop on Speech Synthesis (SSW-9)
- Place of Presentation
  Plug and Play Tech Center (Sunnyvale, USA)
- Year and Date
  2016-09-13
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] WAVELET-BASED DECOMPOSITION OF F0 AS A SECONDARY TASK FOR DNN-BASED SPEECH SYNTHESIS WITH MULTI-TASK LEARNING2016
- Author(s)
  Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi Robert A. J. Clark
- Organizer
  ICASSP 2016
- Place of Presentation
  Shanghai, China
- Year and Date
  2016-03-20
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] The NII speech synthesis entry for Blizzard Challenge 20162016
- Author(s)
  Lauri Juvela, Xin Wang, Shinji Takaki, SangJin Kim, Manu Airaksinen, Junichi Yamagishi
- Organizer
  Blizzard Challenge workshop 2016
- Place of Presentation
  De Anza 3 Theater, Apple Inc (Cupertino, USA)
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis2015
- Author(s)
  Manuel Sam Ribeiro, Junichi Yamagishi, Robert A. J. Clark
- Organizer
  Interspeech 2015
- Place of Presentation
  Dresden, German
- Year and Date
  2015-09-06
- Related Report
  2015 Research-status Report
- Int'l Joint Research

Interactive audiobook using statistical parametric speech synthesis and collective intelligence

Principal Investigator

Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 准教授 (70709352)

¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)

Report

Research Products

[Int'l Joint Research] マドリッド工科大学(スペイン)

Related Report

[Int'l Joint Research] アルト大学(フィンランド)

Related Report

[Int'l Joint Research] Technical University of Madrid(スペイン)

Related Report

[Int'l Joint Research] University of Edinburgh(英国)

Related Report

[Journal Article] Emotion transplantation through adaptation in HMM-based speech synthesis2015

Author(s)

Journal Title

DOI

Related Report

[Presentation] DAPTING AND CONTROLLING DNN-BASED SPEECH SYNTHESIS USING INPUT CODES2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] WAVELET-BASED DECOMPOSITION OF F0 AS A SECONDARY TASK FOR DNN-BASED SPEECH SYNTHESIS WITH MULTI-TASK LEARNING2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] The NII speech synthesis entry for Blizzard Challenge 20162016

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report