• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2020 Fiscal Year Research-status Report

Cross-disciplinary approach to prosody-based automatic speech processing and its application to computer-assisted language teaching

Research Project

Project/Area Number 20K00838
Research InstitutionThe University of Aizu

Principal Investigator

Pyshkin Evgeny  会津大学, コンピュータ理工学部, 上級准教授 (50794088)

Co-Investigator(Kenkyū-buntansha) Mozgovoy Maxim  会津大学, コンピュータ理工学部, 准教授 (60571776)
BLAKE John  会津大学, コンピュータ理工学部, 准教授 (80635954)
Project Period (FY) 2020-04-01 – 2023-03-31
KeywordsSpeech processing / CAPT / Audio-visual feedback / ASR / Langauge prosody
Outline of Annual Research Achievements

We completed a study on the potential of pronunciation teaching with the use of speech processing algorithms and their individualization via computer-aided prosody modeling and visualization instruments. We applied voice activity detection and instrumented our StudyIntonation learning environment with using automated speech recognition algorithms. Having phonemes and their duration and energy, the rhythmic pattern can be retrieved. Transcription and phrasal rhythm are visualized with phrasal intonation shown by pitch curves. We reorganised CAPT courseware to represent each task as a hierarchical phonological structure which contains an intonation curve, a rhythmic pattern and IPA transcription. We started a project on StudyIntonation adoption to the particular case of tonal languages.

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

Early design assessments demonstrate both the high potential of StudyIntonation
environment and the improvements required to create a convenient, intuitive and
interactive CAPT environment. The usability of CAPT tools increases if they are able to display the features of natural connected speech such as elision, assimilation, deletion, juncture, etc. At word level the following pronunciation aspects can be trained: stress positioning; stressed/unstressed syllables effects, e.g. vowel reduction; tone movement. Respectively, at phrasal level the learners might observe: sentence accent placement; rhythmic pattern production; phrasal intonation movements related to communicative functions.
The practical purpose of the StudyIntonation project is twofold: first, to develop and assess a technology-driven language learning environment including a course toolkit with end-user mobile and web-based applications (that we developed); and second, to develop tools for speech annotation and semantic analysis based on intonation patterns and digital signal processing algorithms.

Strategy for Future Research Activity

During assessment, our digital signal processing core allowed inaccuracies in the construction of phonetic transcription of colloquial speech. To the best of our knowledge, the cause of these inaccuracies stems from the ASR model used (e.g. Librispeech), which is trained on audio-books performed by professional actors.
One problem commonly faced while implementing a CAPT system is how to establish a relevant and adequate tailored feedback mechanism. First and most important, we need feedback so that both the teacher and the learner are able to identify and evaluate the segmental and suprasegmental errors. Second, we need feedback to evaluate the current progress and to suggest steps for improvement in the system. Third, the teachers are often interested in getting a kind of behavioral feedback from their students including their interests, involvement or engagement. Finally, there are also usability aspects. Although StudyIntonation enables provisioning the feedback in the form of visuals and some numeric scores, there are still open issues in our design such as (1) metric adequacy and sensitivity to phonemic, rhythmic and intonational distortions; (2) feedback limitations when learners are not verbally instructed what to do to improve; (3) rigid interface when the graphs are not interactive; and (4) the effect of
context which produces multiple prosodic portraits of the same phrase which are difficult to be displayed simultaneously.

Causes of Carryover

Due to COVID-19 restriction we could not arrange our expenses for travel and workshop organization, that is why they need to be transferred to the next fiscal year with the same usage plan as it was in 2020.

  • Research Products

    (4 results)

All 2021 2020 Other

All Int'l Joint Research (1 results) Journal Article (2 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 2 results,  Open Access: 2 results) Remarks (1 results)

  • [Int'l Joint Research] St. Petersburg Polytechnic University(ロシア連邦)

    • Country Name
      RUSSIA FEDERATION
    • Counterpart Institution
      St. Petersburg Polytechnic University
  • [Journal Article] Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching2021

    • Author(s)
      N. Bogach, E. Boitsova, S. Chernonog, A. Lamtev, M. Lesnychaya, I. Lezhenin, A. Novopashenny, R. Svechnikov, D. Tsikach, K. Vasiliev, J. Blake, and E. Pyshkin
    • Journal Title

      Electronics

      Volume: 10 (3), 235 Pages: 1 - 22

    • DOI

      10.3390/electronics10030235

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] A Metaphoric Bridge: Understanding Software Engineering Education through Literature and Fine Arts2020

    • Author(s)
      E. Pyshkin and J. Blake
    • Journal Title

      Society. Communication. Education

      Volume: 11 (3) Pages: 59 - 77

    • DOI

      10.18721/JHSS.11305

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Remarks] Study Intonation: English Intonation Training

    • URL

      http://studyintonation.org/

URL: 

Published: 2021-12-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi