2020 Fiscal Year Final Research Report

Safe and secure speech information processing based on liveness detection and ASVspoof challenge

Research Project

PDF

Project/Area Number	18KT0051
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Multi-year Fund
Section	特設分野
Research Field	The Information Society and Trust
Research Institution	National Institute of Informatics
Principal Investigator	Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)
Project Period (FY)	2018-07-18 – 2021-03-31
Keywords	音声情報処理 / 話者照合 / 生体検知 / 生体認証 / 音声インタフェース
Outline of Final Research Achievements	As speech processing became more widespread in society, attacks on speaker verification and speech recognition began to occur. The purpose of this research is to improve the liveness detection technology of speech and to present a solution to the problem. Liveness detection is a machine learning technology that distinguishes between "voice obtained by another person without permission, processed, and reproduced by an external device" and "live voice uttered on the spot from the living body". Therefore, we built a DB containing a large amount of audio files synthesized by the latest speech synthesis and voice conversion technology, held a competition, and advanced the liveness detection technology in the field. It has become possible to detect artificial voices for which there is no perceived audible difference, and a solution has been obtained that realizes safe and secure voice information processing.
Free Research Field	音声情報処理
Academic Significance and Societal Importance of the Research Achievements	音声情報処理は多くのスマートデバイスで利用されており、社会を支える基盤技術である。音声の生体検知は音声インターフェースの手軽さとトラストの両方を同時に実現する技術であり、社会的意義は高い。実際、本研究を通して構築し、一般公開したDBは、世界のアカデミック組織のみならず、多くの企業にも利用されている。学術的意義も高く、多くの国際会議論文が本DBを利用している。現在AI技術により生成されたメディアが悪用される事が危惧され、deepfakeと呼ばれることもある。本研究は、音声を対象に研究を行ったが、その成果は映像や文字等にも応用可能であると考えられ、今後さらに発展させることが可能であると期待される。