A study of speech information processing based on mathematical models for speaker and linguistic information and there probabilistic integration

Research Project

Project/Area Number	25730105
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	The University of Tokyo
Principal Investigator	SAITO DAISUKE 東京大学, 情報理工学(系)研究科, 助教 (40615150)
Project Period (FY)	2013-04-01 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2015: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2014: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2013: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	音声情報処理 / 声質変換 / 話者識別 / 行列変量 / 言語識別 / テンソル解析 / 話者認識 / 言語認識 / 相対関係特徴
Outline of Final Research Achievements	In this study, to achieve more sophisticated speech information processing, mathematical models which divide speech into linguistic information and speaker information separately were developed. In addition, a framework where these mathematical models are integrated was also developed. We have proposed speech representation based on tensor analysis and applied to language identification and speaker identification. A new voice conversion framework based on matrix variate probabilistic distribution was also developed.

Report

(4 results)

2015 Annual Research Report Final Research Report ( PDF )
2014 Research-status Report
2013 Research-status Report

Research Products
(20 results)

All 2016 2015 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (19 results) (of which Int'l Joint Research: 1 results)

[Journal Article] Eigenvoice-Based Character Conversion for Arbitrary Speakers Using Various Character Voices of a Skilled Voice Actor2013
- Author(s)
  T. Pongkittiphan, D. Saito, N. Minematsu, K. Hirose
- Journal Title
  
  Journal of Signal Processing
  
  Volume: 17 Issue: 4 Pages: 139-142
- DOI
  10.2299/jsp.17.139
- NAID
  130004849293
- ISSN
  1342-6230, 1880-1013
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Presentation] 声・顔の固有空間と GMM に基づく両空間の印象的対応付けに関する検討2016
- Author(s)
  大杉康仁, 齋藤大輔, 峯松信明
- Organizer
  音学シンポジウム2016
- Place of Presentation
  東海大学（東京都港区）
- Year and Date
  2016-05-21
- Related Report
  2015 Annual Research Report
[Presentation] テンソル分解に基づく音声表現とその言語識別・話者識別への応用2016
- Author(s)
  鈴木颯, 齋藤大輔, 峯松信明
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  別府国際コンベンションセンター（大分県別府市）
- Year and Date
  2016-03-28
- Related Report
  2015 Annual Research Report
[Presentation] 話者空間の基底成分を用いたディープニューラルネットワーク任意話者声質変換2016
- Author(s)
  橋本哲弥, 柏木陽佑, 齋藤大輔, 峯松信明
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  横浜桐蔭大学（神奈川県横浜市）
- Year and Date
  2016-03-09
- Related Report
  2015 Annual Research Report
[Presentation] Integration of Multi-Speaker Training and Speaker Adaptation for DBLSTM-RNN-based Text-To-Speech Synthesis2016
- Author(s)
  Yi Zhao, Nobuaki Minematsu, Daisuke Saito
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  横浜桐蔭大学（神奈川県横浜市）
- Year and Date
  2016-03-09
- Related Report
  2015 Annual Research Report
[Presentation] 行列変量ガウス混合モデルに基づく複数フレーム特徴を考慮した声質変換2016
- Author(s)
  楊奕, 内田秀継, 齋藤大輔, 峯松信明
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  横浜桐蔭大学（神奈川県横浜市）
- Year and Date
  2016-03-09
- Related Report
  2015 Annual Research Report
[Presentation] Deep Neural Networkを用いた話者空間基底への射影による声質変換2015
- Author(s)
  橋本哲弥, 柏木陽佑, 齋藤大輔, 峯松信明
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  名古屋工業大学（愛知県名古屋市）
- Year and Date
  2015-12-02
- Related Report
  2015 Annual Research Report
[Presentation] MULTI-SPEAKER SPEECH SYNTHESIS AND SPEAKER ADAPTATION BASED ON DEEP BIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK2015
- Author(s)
  Yi Zhao, Nobuaki Minematsu, Daisuke Saito
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  名古屋工業大学（愛知県名古屋市）
- Year and Date
  2015-12-02
- Related Report
  2015 Annual Research Report
[Presentation] テンソル分解に基づく言語情報表現を用いた言語識別に関する検討2015
- Author(s)
  鈴木颯, 齋藤大輔, 峯松信明
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  会津大学（福島県会津若松市）
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] 混合ガウス分布を用いた自然音声への人工感の付与2015
- Author(s)
  小林航也, 齋藤大輔, 峯松信明, 広瀬啓吉
- Organizer
  音学シンポジウム2015
- Place of Presentation
  電気通信大学（東京都調布市）
- Year and Date
  2015-05-23
- Related Report
  2015 Annual Research Report
[Presentation] SAS: A speaker verification spoofing database containing diverse attacks2015
- Author(s)
  Zhizheng Wu, Ali Khodabakhsh, Cenk Demiroglu, Junichi Yamagishi, Daisuke Saito, Tomoki Toda, Simon King
- Organizer
  ICASSP
- Place of Presentation
  Brisbane (Australia)
- Year and Date
  2015-04-19
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] テンソル分解に基づく話者情報表現を用いた話者識別の検討2015
- Author(s)
  チン・トゥアン・トゥー, 齋藤大輔, 峯松信明, 広瀬啓吉
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  中央大学, 東京
- Year and Date
  2015-03-16 – 2015-03-18
- Related Report
  2014 Research-status Report
[Presentation] Voice Conversion Based on Matrix Variate Gaussian Mixture Model2014
- Author(s)
  Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose
- Organizer
  IEEE ICSP2014
- Place of Presentation
  杭州, 中国
- Year and Date
  2014-10-19 – 2014-10-23
- Related Report
  2014 Research-status Report
[Presentation] Application of Matrix Variate Gaussian Mixture Model to Statistical Voice Conversion2014
- Author(s)
  Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose
- Organizer
  ISCA INTERSPEECH 2014
- Place of Presentation
  Singapore, Singapore
- Year and Date
  2014-09-14 – 2014-09-18
- Related Report
  2014 Research-status Report
[Presentation] Tensor representation for speaker characteristics in speech2014
- Author(s)
  Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
- Organizer
  Forum Acusticum
- Place of Presentation
  Krakow, Poland
- Year and Date
  2014-09-07 – 2014-09-12
- Related Report
  2014 Research-status Report
[Presentation] 話者依存サブネットワークを用いた深層学習による多対一声質変換2014
- Author(s)
  橋本哲弥, 柏木陽佑, 齋藤大輔, 広瀬啓吉, 峯松信明
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  北海学園大学, 北海道
- Year and Date
  2014-09-03 – 2014-09-05
- Related Report
  2014 Research-status Report
[Presentation] 行列変量正規分布の混合モデルとその声質変換への応用2014
- Author(s)
  齋藤大輔, 土井秀信, 峯松信明, 広瀬啓吉
- Organizer
  情報処理学会音声言語情報処理研究会
- Place of Presentation
  ホテル花巻，岩手県
- Year and Date
  2014-07-24 – 2014-07-26
- Related Report
  2014 Research-status Report
[Presentation] 行列変量ガウス混合分布に基づく声質変換の検討2014
- Author(s)
  土井秀信, 齋藤大輔, 峯松信明, 広瀬啓吉
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学, 御茶ノ水, 東京
- Related Report
  2013 Research-status Report
[Presentation] 構造的表象とGMMスーパーベクトルを用いた言語識別に関する検討2014
- Author(s)
  鈴木颯, 齋藤大輔, 峯松信明, 広瀬啓吉
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学, 御茶ノ水, 東京
- Related Report
  2013 Research-status Report
[Presentation] Noisy Channel Modelに基づく音声特徴量強調に関する検討2014
- Author(s)
  バン・フクアンフイ, 齋藤大輔, 柏木陽佑, 峯松信明, 広瀬啓吉
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  日本大学, 御茶ノ水, 東京
- Related Report
  2013 Research-status Report

A study of speech information processing based on mathematical models for speaker and linguistic information and there probabilistic integration

Principal Investigator

SAITO DAISUKE 東京大学, 情報理工学(系)研究科, 助教 (40615150)

¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)

Report

Research Products

[Journal Article] Eigenvoice-Based Character Conversion for Arbitrary Speakers Using Various Character Voices of a Skilled Voice Actor2013

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] 声・顔の固有空間と GMM に基づく両空間の印象的対応付けに関する検討2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] テンソル分解に基づく音声表現とその言語識別・話者識別への応用2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 話者空間の基底成分を用いたディープニューラルネットワーク任意話者声質変換2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Integration of Multi-Speaker Training and Speaker Adaptation for DBLSTM-RNN-based Text-To-Speech Synthesis2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 行列変量ガウス混合モデルに基づく複数フレーム特徴を考慮した声質変換2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Deep Neural Networkを用いた 話者空間基底への射影による声質変換2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] MULTI-SPEAKER SPEECH SYNTHESIS AND SPEAKER ADAPTATION BASED ON DEEP BIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] テンソル分解に基づく言語情報表現を用いた言語識別に関する検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 混合ガウス分布を用いた自然音声への人工感の付与2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] SAS: A speaker verification spoofing database containing diverse attacks2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] テンソル分解に基づく話者情報表現を用いた話者識別の検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Voice Conversion Based on Matrix Variate Gaussian Mixture Model2014

[Presentation] Deep Neural Networkを用いた話者空間基底への射影による声質変換2015

[Presentation] 行列変量正規分布の混合モデルとその声質変換への応用2014