Project/Area Number |
16K12465
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Perceptual information processing
|
Research Institution | Waseda University |
Principal Investigator |
Ogawa Tetsuji 早稲田大学, 理工学術院, 准教授 (70386598)
|
Research Collaborator |
Tawara Naohiro
|
Project Period (FY) |
2016-04-01 – 2019-03-31
|
Project Status |
Completed (Fiscal Year 2018)
|
Budget Amount *help |
¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2018: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2017: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
|
Keywords | 話者照合 / 特徴抽出 / 深層学習 / 特徴表現学習 / 深層ニューラルネットワーク / 音声合成 |
Outline of Final Research Achievements |
An attempt was made to develop a neural network to learn speaker representations that are not affected by phoneme information under the assumption that speaker and phoneme information are separable on acoustic features. As the achievement, the disentangling neural network was successfully developed to extract the phoneme and speaker information separately from each frame of acoustic features. The present study introduced statistical pooling, which aims at reflecting the utterance-by-utterance speaker information to the frame-by-frame features, and demonstrated that the pooling just before classification (i.e., late pooling) performed well. In addition, a loss function based on the entropy of classifiers was introduced to optimize feature extractors such that the extracted features could contain only the desired speaker-specific and phoneme-specific information and shown to be effective in speaker verification.
|
Academic Significance and Societal Importance of the Research Achievements |
本研究成果は,発話内容の違いの影響による話者照合性能劣化に対する本質的な解法を与えるもので,音声によるバイオメトリクス認証などアプリケーションとしての期待は高いものの依然として実用のレベルに達していない,数秒程度の短い発話に対する話者照合の性能を抜本的に改善することを可能とする.また,本研究を通じて,これまでほとんど議論されてこなかった「真の話者性」を工学的に明らかにするための新たな研究領域の開拓が期待できる.これは話者認識研究における本質的な問いであり,当該研究分野において日本のプレゼンスを示す好機ともなる.
|