2022 Fiscal Year Final Research Report

Cross-disciplinary approach to prosody-based automatic speech processing and its application to computer-assisted language teaching

Research Project

PDF

Project/Area Number	20K00838
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 02100:Foreign language education-related
Research Institution	The University of Aizu
Principal Investigator	Pyshkin Evgeny 会津大学, コンピュータ理工学部, 上級准教授 (50794088)
Co-Investigator(Kenkyū-buntansha)	Mozgovoy Maxim 会津大学, コンピュータ理工学部, 准教授 (60571776) BLAKE John 会津大学, コンピュータ理工学部, 上級准教授 (80635954)
Project Period (FY)	2020-04-01 – 2023-03-31
Keywords	CAPT / prosody / speech visualization / pitch estimation / multimodal feedback
Outline of Final Research Achievements	We completed a study on the potential of CAPT system advancement based on signal and speech recognition and speech processing algorithms and their customization via computer-aided prosody modeling and visualization instruments. We developed the digital signal processing core comprising pitch extraction, voice activity detection, pitch graph interpolation, and pitch estimation, the latter based on using dynamic time warping algorithm. The current implementation supports the transcription and phrasal intonation visualization shown by model and user pitch curves accompanied by a multimodal feedback including DTW-based metrics, extended phonetic transcription, and audial and video output, thus, providing a foundation for further feedback tailoring with evaluative, instructive, and actionable components. The system has been assessed for several languages representing different language groups, thus, creating good ground for further multilingual setup of personalizable CAPT environment.
Free Research Field	Human-centric software
Academic Significance and Societal Importance of the Research Achievements	The project advances a prosody-based CAPT system using signal and speech processing algorithms for speech visualization and providing a multimodal feedback to learners. Applying the approach to different language groups has a strong impact to improving communication skills of language learners.