2020 Fiscal Year Research-status Report
Cross-disciplinary approach to prosody-based automatic speech processing and its application to computer-assisted language teaching
Project/Area Number |
20K00838
|
Research Institution | The University of Aizu |
Principal Investigator |
Pyshkin Evgeny 会津大学, コンピュータ理工学部, 上級准教授 (50794088)
|
Co-Investigator(Kenkyū-buntansha) |
Mozgovoy Maxim 会津大学, コンピュータ理工学部, 准教授 (60571776)
BLAKE John 会津大学, コンピュータ理工学部, 准教授 (80635954)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | Speech processing / CAPT / Audio-visual feedback / ASR / Langauge prosody |
Outline of Annual Research Achievements |
We completed a study on the potential of pronunciation teaching with the use of speech processing algorithms and their individualization via computer-aided prosody modeling and visualization instruments. We applied voice activity detection and instrumented our StudyIntonation learning environment with using automated speech recognition algorithms. Having phonemes and their duration and energy, the rhythmic pattern can be retrieved. Transcription and phrasal rhythm are visualized with phrasal intonation shown by pitch curves. We reorganised CAPT courseware to represent each task as a hierarchical phonological structure which contains an intonation curve, a rhythmic pattern and IPA transcription. We started a project on StudyIntonation adoption to the particular case of tonal languages.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Early design assessments demonstrate both the high potential of StudyIntonation environment and the improvements required to create a convenient, intuitive and interactive CAPT environment. The usability of CAPT tools increases if they are able to display the features of natural connected speech such as elision, assimilation, deletion, juncture, etc. At word level the following pronunciation aspects can be trained: stress positioning; stressed/unstressed syllables effects, e.g. vowel reduction; tone movement. Respectively, at phrasal level the learners might observe: sentence accent placement; rhythmic pattern production; phrasal intonation movements related to communicative functions. The practical purpose of the StudyIntonation project is twofold: first, to develop and assess a technology-driven language learning environment including a course toolkit with end-user mobile and web-based applications (that we developed); and second, to develop tools for speech annotation and semantic analysis based on intonation patterns and digital signal processing algorithms.
|
Strategy for Future Research Activity |
During assessment, our digital signal processing core allowed inaccuracies in the construction of phonetic transcription of colloquial speech. To the best of our knowledge, the cause of these inaccuracies stems from the ASR model used (e.g. Librispeech), which is trained on audio-books performed by professional actors. One problem commonly faced while implementing a CAPT system is how to establish a relevant and adequate tailored feedback mechanism. First and most important, we need feedback so that both the teacher and the learner are able to identify and evaluate the segmental and suprasegmental errors. Second, we need feedback to evaluate the current progress and to suggest steps for improvement in the system. Third, the teachers are often interested in getting a kind of behavioral feedback from their students including their interests, involvement or engagement. Finally, there are also usability aspects. Although StudyIntonation enables provisioning the feedback in the form of visuals and some numeric scores, there are still open issues in our design such as (1) metric adequacy and sensitivity to phonemic, rhythmic and intonational distortions; (2) feedback limitations when learners are not verbally instructed what to do to improve; (3) rigid interface when the graphs are not interactive; and (4) the effect of context which produces multiple prosodic portraits of the same phrase which are difficult to be displayed simultaneously.
|
Causes of Carryover |
Due to COVID-19 restriction we could not arrange our expenses for travel and workshop organization, that is why they need to be transferred to the next fiscal year with the same usage plan as it was in 2020.
|
Research Products
(4 results)
-
-
[Journal Article] Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching2021
Author(s)
N. Bogach, E. Boitsova, S. Chernonog, A. Lamtev, M. Lesnychaya, I. Lezhenin, A. Novopashenny, R. Svechnikov, D. Tsikach, K. Vasiliev, J. Blake, and E. Pyshkin
-
Journal Title
Electronics
Volume: 10 (3), 235
Pages: 1 - 22
DOI
Peer Reviewed / Open Access / Int'l Joint Research
-
-