A study on speaker-specific information extraction in consideration of vocalization mechanism and its application to speaker verification

Research Project

Project/Area Number	16K12465
Research Category	Grant-in-Aid for Challenging Exploratory Research
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	Waseda University
Principal Investigator	Ogawa Tetsuji 早稲田大学, 理工学術院, 准教授 (70386598)
Research Collaborator	Tawara Naohiro
Project Period (FY)	2016-04-01 – 2019-03-31
Project Status	Completed (Fiscal Year 2018)
Budget Amount *help	¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000) Fiscal Year 2018: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2017: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords	話者照合 / 特徴抽出 / 深層学習 / 特徴表現学習 / 深層ニューラルネットワーク / 音声合成
Outline of Final Research Achievements	An attempt was made to develop a neural network to learn speaker representations that are not affected by phoneme information under the assumption that speaker and phoneme information are separable on acoustic features. As the achievement, the disentangling neural network was successfully developed to extract the phoneme and speaker information separately from each frame of acoustic features. The present study introduced statistical pooling, which aims at reflecting the utterance-by-utterance speaker information to the frame-by-frame features, and demonstrated that the pooling just before classification (i.e., late pooling) performed well. In addition, a loss function based on the entropy of classifiers was introduced to optimize feature extractors such that the extracted features could contain only the desired speaker-specific and phoneme-specific information and shown to be effective in speaker verification.
Academic Significance and Societal Importance of the Research Achievements	本研究成果は，発話内容の違いの影響による話者照合性能劣化に対する本質的な解法を与えるもので，音声によるバイオメトリクス認証などアプリケーションとしての期待は高いものの依然として実用のレベルに達していない，数秒程度の短い発話に対する話者照合の性能を抜本的に改善することを可能とする．また，本研究を通じて，これまでほとんど議論されてこなかった「真の話者性」を工学的に明らかにするための新たな研究領域の開拓が期待できる．これは話者認識研究における本質的な問いであり，当該研究分野において日本のプレゼンスを示す好機ともなる．

Report

(4 results)

2018 Annual Research Report Final Research Report ( PDF )
2017 Research-status Report
2016 Research-status Report

Research Products
(16 results)

All 2019 2018 2017 2016

All Journal Article (6 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 6 results, Open Access: 1 results) Presentation (8 results) Book (2 results)

[Journal Article] Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations2018
- Author(s)
  Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi
- Journal Title
  
  Proc. ICASSP2018
  
  Volume: - Pages: 6084-6088
- Related Report
  2017 Research-status Report
- Peer Reviewed
[Journal Article] Speaker invariant feature extraction for zero-resource languages with adversarial training2018
- Author(s)
  Taira Tsuchiya, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa
- Journal Title
  
  Proc. ICASSP2018
  
  Volume: - Pages: 2381-2385
- Related Report
  2017 Research-status Report
- Peer Reviewed
[Journal Article] Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations2017
- Author(s)
  Ashikawa Hiroto、Tawara Naohiro、Ogawa Atsunori、Iwata Tomoharu、Kobayashi Tetsunori、Ogawa Tetsuji
- Journal Title
  
  Proc. APSIPA2017
  
  Volume: - Pages: 1263-1267
- DOI
  10.1109/apsipa.2017.8282217
- Related Report
  2017 Research-status Report
- Peer Reviewed
[Journal Article] Associative memory model-based linear filtering and its application to tandem connectionist blind source separation2016
- Author(s)
  Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi
- Journal Title
  
  IEEE Trans. ASLP
  
  Volume: 25 Issue: 3 Pages: 637-650
- DOI
  10.1109/taslp.2017.2653941
- Related Report
  2017 Research-status Report 2016 Research-status Report
- Peer Reviewed
[Journal Article] Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering2016
- Author(s)
  Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi
- Journal Title
  
  APSIPA Trans. Signal & Info. Process.
  
  Volume: 5 Issue: 1
- DOI
  10.1017/atsip.2016.15
- Related Report
  2016 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation2016
- Author(s)
  Tetsuji Ogawa, Harish Mallidi, Emmanuel Dupoux, Jordan Cohen, Naomi Feldman, Hynek Hermansky
- Journal Title
  
  Proc. ICPR2016
  
  Volume: － Pages: 2223-2228
- Related Report
  2016 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Presentation] 音韻・話者特徴抽出のためのディスエンタングリングニューラルネットワークの実現にむけて2019
- Author(s)
  俵直弘，小林哲則，小川哲司
- Organizer
  日本音響学会春季研究発表会
- Related Report
  2018 Annual Research Report
[Presentation] ゼロリソース言語音声認識のための発話者の違いに頑健な特徴抽出2019
- Author(s)
  樋口陽祐，俵直弘，小川哲司，小林哲則
- Organizer
  日本音響学会春季研究発表会
- Related Report
  2018 Annual Research Report
[Presentation] DPGMMと敵対的学習に基づく話者の違いに頑健な特徴抽出とゼロリソース音声認識での評価2019
- Author(s)
  樋口陽祐，俵直弘，小林哲則，小川哲司
- Organizer
  2019年7月度音声研究会
- Related Report
  2018 Annual Research Report
[Presentation] 敵対的学習に基づく話者特徴抽出2018
- Author(s)
  俵直弘，土屋平，小川哲司，小林哲則
- Organizer
  2018年日本音響学会春季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] 話者正規化における言語非依存性とゼロリソース音声認識における効果2018
- Author(s)
  島田拓也，俵直弘，小川哲司，小林哲則
- Organizer
  2018年日本音響学会春季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] 敵対的学習を用いた話者の違いに頑健な特徴抽出とゼロリソース音素識別による評価2018
- Author(s)
  土屋平，俵直弘，小川哲司，小林哲則
- Organizer
  2018年日本音響学会春季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] ドメイン依存・非依存の内部表現を有する再帰型ニューラルネットワーク言語モデル2017
- Author(s)
  森岡幹，俵直弘，小川哲司，小川厚徳，岩田具治，小林哲則
- Organizer
  2017年日本音響学会秋季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] 複数人対話を対象としたRNN言語モデルにおける発話終端情報利用の有効性2017
- Author(s)
  芦川博人，俵直弘，小川厚徳，岩田具治，小林哲則，小川哲司
- Organizer
  2017年日本音響学会秋季研究発表会
- Related Report
  2017 Research-status Report
[Book] 人工知能学大辞典, 人工知能学会（編），話者認識・話者照合2017
- Author(s)
  小川哲司
- Total Pages
  2
- Publisher
  共立出版
- Related Report
  2017 Research-status Report
[Book] 話者ダイアライゼーション（音響学会編・音響キーワードブック）2016
- Author(s)
  小川哲司
- Total Pages
  2
- Publisher
  コロナ社
- Related Report
  2016 Research-status Report

A study on speaker-specific information extraction in consideration of vocalization mechanism and its application to speaker verification

Principal Investigator

Ogawa Tetsuji 早稲田大学, 理工学術院, 准教授 (70386598)

¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)

Report

Research Products

[Journal Article] Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations2018

Author(s)

Journal Title

Related Report

[Journal Article] Speaker invariant feature extraction for zero-resource languages with adversarial training2018

Author(s)

Journal Title

Related Report

[Journal Article] Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations2017

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Associative memory model-based linear filtering and its application to tandem connectionist blind source separation2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation2016

Author(s)

Journal Title

Related Report

[Presentation] 音韻・話者特徴抽出のためのディスエンタングリングニューラルネットワークの実現にむけて2019

Author(s)

Organizer

Related Report

[Presentation] ゼロリソース言語音声認識のための発話者の違いに頑健な特徴抽出2019

Author(s)

Organizer

Related Report

[Presentation] DPGMMと敵対的学習に基づく話者の違いに頑健な特徴抽出とゼロリソース音声認識での評価2019

Author(s)

Organizer

Related Report

[Presentation] 敵対的学習に基づく話者特徴抽出2018

Author(s)

Organizer

Related Report

[Presentation] 話者正規化における言語非依存性とゼロリソース音声認識における効果2018

Author(s)

Organizer

Related Report

[Presentation] 敵対的学習を用いた話者の違いに頑健な特徴抽出とゼロリソース音素識別による評価2018

Author(s)

Organizer

Related Report

[Presentation] ドメイン依存・非依存の内部表現を有する再帰型ニューラルネットワーク言語モデル2017

Author(s)

Organizer

Related Report

[Presentation] 複数人対話を対象としたRNN言語モデルにおける発話終端情報利用の有効性2017

Author(s)

Organizer

Related Report

[Book] 人工知能学大辞典, 人工知能学会（編），話者認識・話者照合2017

Author(s)

Total Pages

Publisher

Related Report

[Book] 話者ダイアライゼーション（音響学会編・音響キーワードブック）2016

Author(s)

Total Pages

Publisher

Related Report