2022 Fiscal Year Final Research Report
Cross-disciplinary approach to prosody-based automatic speech processing and its application to computer-assisted language teaching
Project/Area Number |
20K00838
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 02100:Foreign language education-related
|
Research Institution | The University of Aizu |
Principal Investigator |
Pyshkin Evgeny 会津大学, コンピュータ理工学部, 上級准教授 (50794088)
|
Co-Investigator(Kenkyū-buntansha) |
Mozgovoy Maxim 会津大学, コンピュータ理工学部, 准教授 (60571776)
BLAKE John 会津大学, コンピュータ理工学部, 上級准教授 (80635954)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | CAPT / prosody / speech visualization / pitch estimation / multimodal feedback |
Outline of Final Research Achievements |
We completed a study on the potential of CAPT system advancement based on signal and speech recognition and speech processing algorithms and their customization via computer-aided prosody modeling and visualization instruments. We developed the digital signal processing core comprising pitch extraction, voice activity detection, pitch graph interpolation, and pitch estimation, the latter based on using dynamic time warping algorithm. The current implementation supports the transcription and phrasal intonation visualization shown by model and user pitch curves accompanied by a multimodal feedback including DTW-based metrics, extended phonetic transcription, and audial and video output, thus, providing a foundation for further feedback tailoring with evaluative, instructive, and actionable components. The system has been assessed for several languages representing different language groups, thus, creating good ground for further multilingual setup of personalizable CAPT environment.
|
Free Research Field |
Human-centric software
|
Academic Significance and Societal Importance of the Research Achievements |
The project advances a prosody-based CAPT system using signal and speech processing algorithms for speech visualization and providing a multimodal feedback to learners. Applying the approach to different language groups has a strong impact to improving communication skills of language learners.
|