• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

A study on speaker-specific information extraction in consideration of vocalization mechanism and its application to speaker verification

Research Project

Project/Area Number 16K12465
Research Category

Grant-in-Aid for Challenging Exploratory Research

Allocation TypeMulti-year Fund
Research Field Perceptual information processing
Research InstitutionWaseda University

Principal Investigator

Ogawa Tetsuji  早稲田大学, 理工学術院, 准教授 (70386598)

Research Collaborator Tawara Naohiro  
Project Period (FY) 2016-04-01 – 2019-03-31
Project Status Completed (Fiscal Year 2018)
Budget Amount *help
¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2018: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2017: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords話者照合 / 特徴抽出 / 深層学習 / 特徴表現学習 / 深層ニューラルネットワーク / 音声合成
Outline of Final Research Achievements

An attempt was made to develop a neural network to learn speaker representations that are not affected by phoneme information under the assumption that speaker and phoneme information are separable on acoustic features. As the achievement, the disentangling neural network was successfully developed to extract the phoneme and speaker information separately from each frame of acoustic features. The present study introduced statistical pooling, which aims at reflecting the utterance-by-utterance speaker information to the frame-by-frame features, and demonstrated that the pooling just before classification (i.e., late pooling) performed well. In addition, a loss function based on the entropy of classifiers was introduced to optimize feature extractors such that the extracted features could contain only the desired speaker-specific and phoneme-specific information and shown to be effective in speaker verification.

Academic Significance and Societal Importance of the Research Achievements

本研究成果は,発話内容の違いの影響による話者照合性能劣化に対する本質的な解法を与えるもので,音声によるバイオメトリクス認証などアプリケーションとしての期待は高いものの依然として実用のレベルに達していない,数秒程度の短い発話に対する話者照合の性能を抜本的に改善することを可能とする.また,本研究を通じて,これまでほとんど議論されてこなかった「真の話者性」を工学的に明らかにするための新たな研究領域の開拓が期待できる.これは話者認識研究における本質的な問いであり,当該研究分野において日本のプレゼンスを示す好機ともなる.

Report

(4 results)
  • 2018 Annual Research Report   Final Research Report ( PDF )
  • 2017 Research-status Report
  • 2016 Research-status Report
  • Research Products

    (16 results)

All 2019 2018 2017 2016

All Journal Article (6 results) (of which Int'l Joint Research: 1 results,  Peer Reviewed: 6 results,  Open Access: 1 results) Presentation (8 results) Book (2 results)

  • [Journal Article] Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations2018

    • Author(s)
      Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi
    • Journal Title

      Proc. ICASSP2018

      Volume: - Pages: 6084-6088

    • Related Report
      2017 Research-status Report
    • Peer Reviewed
  • [Journal Article] Speaker invariant feature extraction for zero-resource languages with adversarial training2018

    • Author(s)
      Taira Tsuchiya, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa
    • Journal Title

      Proc. ICASSP2018

      Volume: - Pages: 2381-2385

    • Related Report
      2017 Research-status Report
    • Peer Reviewed
  • [Journal Article] Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations2017

    • Author(s)
      Ashikawa Hiroto、Tawara Naohiro、Ogawa Atsunori、Iwata Tomoharu、Kobayashi Tetsunori、Ogawa Tetsuji
    • Journal Title

      Proc. APSIPA2017

      Volume: - Pages: 1263-1267

    • DOI

      10.1109/apsipa.2017.8282217

    • Related Report
      2017 Research-status Report
    • Peer Reviewed
  • [Journal Article] Associative memory model-based linear filtering and its application to tandem connectionist blind source separation2016

    • Author(s)
      Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi
    • Journal Title

      IEEE Trans. ASLP

      Volume: 25 Issue: 3 Pages: 637-650

    • DOI

      10.1109/taslp.2017.2653941

    • Related Report
      2017 Research-status Report 2016 Research-status Report
    • Peer Reviewed
  • [Journal Article] Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering2016

    • Author(s)
      Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi
    • Journal Title

      APSIPA Trans. Signal & Info. Process.

      Volume: 5 Issue: 1

    • DOI

      10.1017/atsip.2016.15

    • Related Report
      2016 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation2016

    • Author(s)
      Tetsuji Ogawa, Harish Mallidi, Emmanuel Dupoux, Jordan Cohen, Naomi Feldman, Hynek Hermansky
    • Journal Title

      Proc. ICPR2016

      Volume: - Pages: 2223-2228

    • Related Report
      2016 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Presentation] 音韻・話者特徴抽出のためのディスエンタングリングニューラルネットワークの実現にむけて2019

    • Author(s)
      俵直弘,小林哲則,小川哲司
    • Organizer
      日本音響学会春季研究発表会
    • Related Report
      2018 Annual Research Report
  • [Presentation] ゼロリソース言語音声認識のための発話者の違いに頑健な特徴抽出2019

    • Author(s)
      樋口陽祐,俵直弘,小川哲司,小林哲則
    • Organizer
      日本音響学会春季研究発表会
    • Related Report
      2018 Annual Research Report
  • [Presentation] DPGMMと敵対的学習に基づく話者の違いに頑健な特徴抽出とゼロリソース音声認識での評価2019

    • Author(s)
      樋口陽祐,俵直弘,小林哲則,小川哲司
    • Organizer
      2019年7月度音声研究会
    • Related Report
      2018 Annual Research Report
  • [Presentation] 敵対的学習に基づく話者特徴抽出2018

    • Author(s)
      俵直弘,土屋平,小川哲司,小林哲則
    • Organizer
      2018年日本音響学会春季研究発表会
    • Related Report
      2017 Research-status Report
  • [Presentation] 話者正規化における言語非依存性とゼロリソース音声認識における効果2018

    • Author(s)
      島田拓也,俵直弘,小川哲司,小林哲則
    • Organizer
      2018年日本音響学会春季研究発表会
    • Related Report
      2017 Research-status Report
  • [Presentation] 敵対的学習を用いた話者の違いに頑健な特徴抽出とゼロリソース音素識別による評価2018

    • Author(s)
      土屋平,俵直弘,小川哲司,小林哲則
    • Organizer
      2018年日本音響学会春季研究発表会
    • Related Report
      2017 Research-status Report
  • [Presentation] ドメイン依存・非依存の内部表現を有する再帰型ニューラルネットワーク言語モデル2017

    • Author(s)
      森岡幹,俵直弘,小川哲司,小川厚徳,岩田具治,小林哲則
    • Organizer
      2017年日本音響学会秋季研究発表会
    • Related Report
      2017 Research-status Report
  • [Presentation] 複数人対話を対象としたRNN言語モデルにおける発話終端情報利用の有効性2017

    • Author(s)
      芦川博人,俵直弘,小川厚徳,岩田具治,小林哲則,小川哲司
    • Organizer
      2017年日本音響学会秋季研究発表会
    • Related Report
      2017 Research-status Report
  • [Book] 人工知能学大辞典, 人工知能学会(編),話者認識・話者照合2017

    • Author(s)
      小川哲司
    • Total Pages
      2
    • Publisher
      共立出版
    • Related Report
      2017 Research-status Report
  • [Book] 話者ダイアライゼーション(音響学会編・音響キーワードブック)2016

    • Author(s)
      小川哲司
    • Total Pages
      2
    • Publisher
      コロナ社
    • Related Report
      2016 Research-status Report

URL: 

Published: 2016-04-21   Modified: 2020-03-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi