2020 Fiscal Year Final Research Report

Depelopment of high-quality speech analysis-synthesis systems with ability to extract 3D vocal tract shape and vocal cord vibration signal precisely

Research Project

PDF

Project/Area Number	17K00253
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Meijo University
Principal Investigator	Banno Hideki 名城大学, 理工学部, 准教授 (20335003)
Project Period (FY)	2017-04-01 – 2021-03-31
Keywords	3次元声道形状 / 声道断面積関数 / PARCOR分析 / フォルマント / FDTD法
Outline of Final Research Achievements	One of the methods to estimate vocal tract shape information from speech signal is the PARCOR analysis-based method which converts the PARCOR coefficients of speech signal into the vocal tract area function. However, the estimated vocal tract area function sometimes is incorrect and does not always represent complicated shape. Accordingly, we started the study on the method estimating 3-D vocal tract shape precisely from speech signal. Firstly, physical 1-D vocal tract models which correspond to the estimated vocal tract area function were created by 3-D printing, then the acoustic characteristics of the models were measured. Secondly, the characteristics were compared with simulation results generated by acoustic simulation methods such as the FDTD method which can generate simulated acoustic characteristics from shape information. Lastly, based on these comparisons, we improved our method.
Free Research Field	音情報処理
Academic Significance and Societal Importance of the Research Achievements	音声信号から発声器官のパラメータを推定する研究や口唇の画像を生成する研究、3次元声道形状から音声を合成する研究は存在しているが、音声信号から3次元声道形状を推定し、さらにそれを用いて高品質に音声を合成する研究は世界的にも類を見ず、極めて独創的な研究である。今回の研究では、詳細な3次元声道形状を推定する部分の実現はできなかったが、今後、言語教育における発声の可視化への応用や、声質変換などの応用における新しい音声補間の方法の開発にもつながるなど、極めて意義深い研究であると考えている。