2021 Fiscal Year Final Research Report

Infrastructure for analyzing the prosody of speaker-mixed speech for modeling daily conversation

Research Project

PDF

Project/Area Number	19H01252
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Review Section	Basic Section 02060:Linguistics-related
Research Institution	Utsunomiya University
Principal Investigator	Mori Hiroki 宇都宮大学, 工学部, 准教授 (10302184)
Co-Investigator(Kenkyū-buntansha)	前川喜久雄大学共同利用機関法人人間文化研究機構国立国語研究所, 音声言語研究領域, 教授 (20173693) 小磯花絵大学共同利用機関法人人間文化研究機構国立国語研究所, 音声言語研究領域, 教授 (30312200) 小野順貴東京都立大学, システムデザイン研究科, 教授 (80334259) 永田智洋帝京大学, 理工学部, 助教 (80823450)
Project Period (FY)	2019-04-01 – 2022-03-31
Keywords	ニューラルF0モデル / 音源分離 / 話者埋め込み
Outline of Final Research Achievements	This project aimed to establish a fundamental technology to estimates pitch independently for each speaker given overlapping speech recorded in everyday circumstances, and achieved the following: (1) Developed a speech separation that takes the movement of speakers or microphones into account. This suppresses components of speakers other than the target, and is expected to improve the accuracy of subsequent pitch estimation. A listening test for the Corpus of Everyday Japanese Conversation revealed its effectiveness. (2) Developed a novel deep learning method for extracting pitch information of specified speaker. Results of evaluation experiments on overlapping speech demonstrated that the proposed method could reduce the gross pitch error by than 60% compared to the case for which the proposed method was not applied.
Free Research Field	音声言語情報処理
Academic Significance and Societal Importance of the Research Achievements	日常場面の中で当事者たち自身の動機や目的によって自然に生じた会話を収録したコーパスでは、各話者の音声が音響的に分離されておらず、本人以外の声も入り込んでしまう。日常会話では複数の話者の発話が頻繁に重なっており，そのような部分では音声の韻律的特徴を正確に分析することができない。本研究の成果は、このような話者混在音声から各話者の韻律情報を分離する技術に道筋を付けるものであり、実環境で収録されたデータに応用することで、音声学・社会科学・心理学・音声情報処理をはじめとする広範な研究分野に貢献することが期待される。