2019 Fiscal Year Final Research Report

Research for unsupervised acoustic pattern discovery with zero resources

Research Project

PDF

Project/Area Number	17K00237
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Nara Institute of Science and Technology
Principal Investigator	Sakti Sakriani 奈良先端科学技術大学院大学, 先端科学技術研究科, 特任准教授 (00395005)
Co-Investigator(Kenkyū-buntansha)	中村哲奈良先端科学技術大学院大学, データ駆動型サイエンス創造センター, 教授 (30263429)
Project Period (FY)	2017-04-01 – 2020-03-31
Keywords	音声認識 / ゼロ資源音声技術 / 脳波 / 音声翻訳
Outline of Final Research Achievements	With the Tokyo Olympics and Paralympics approaching, language barriers between tourists are becoming critical problems to overcome. Current speech recognition and speech translation have been readily available, but only for several languages where large resources are available. Here, we addressed zero-resource speech problem where language specific knowledge and collection of transcribed data are not available. In order to understand the unknown language, we analyzed and investigated the process by which the human brain processes language. In addition, we have developed a closed-loop speech chain model based on deep learning so that we can learn how to listen while the machine is speaking. This is the first deep learning model that integrates human speech recognition and production behavior.
Free Research Field	情報学
Academic Significance and Societal Importance of the Research Achievements	アフリカ言語（ツォンガ語）とインドネシア言語のゼロリソースモデリングの構築に成功した。また、2017年と2019年の世界ゼロ資源スピーチチャレンジに参加し、提案手法で上位結果を得ることができた。さらに、深層学習に基づく閉ループスピーチチェインモデルを開発して、機械が話している間、聞く方法を学習できるようにした。2019年では世界言語言語コンソーシアムのためにユネスコとも協力した。この研究の結果は、トップ会議（ASRU、Interspeech、ICASSP）とトップジャーナル(IEEE / ACM TASLP)で公開された。さらに、スピーチチェインモデルの特許も取得した。