2021 Fiscal Year Final Research Report

Multi-Modal Speech Enhancement Using Mobile Device

Research Project

PDF

Project/Area Number	19K12905
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 90150:Medical assistive technology-related
Research Institution	Osaka Institute of Technology
Principal Investigator	MATSUI Kenji 大阪工業大学, ロボティクス&デザイン工学部, 教授 (30613682)
Co-Investigator(Kenkyū-buntansha)	中藤良久九州工業大学, 大学院工学研究院, 教授 (10599955) 加藤弓子聖マリアンナ医科大学, 医学部, 研究員 (10600463) 水町光徳九州工業大学, 大学院工学研究院, 准教授 (90380740)
Project Period (FY)	2019-04-01 – 2022-03-31
Keywords	機械読唇 / 発声支援 / 変分オートエンコーダー / 口形素 / 深度画像 / 携帯端末
Outline of Final Research Achievements	We have been developing a speech enhancement device for laryngectomees. Our approach is to use a lip-reading technology to be able to recognize Japanese words from lip images and generate speech outputs using mobile devices. The target words are translated into registered 36 viseme sequences, and converted into VAE (Variational Auto Encoder) feature parameters. Then the corresponding words are recognized using CNN-based model. PC-based prototype was tested, and observed more than 90% accuracy with 20 Japanese words and a well-trained single subject. Also, we developed a mobile device based prototype and conducted the preliminary recognition experiment with 26 words by a well-trained single subject, and 95% accuracy was obtained including the 1st through 6th candidates, which was almost equivalent to the PC-based system. To be able to improve consonant recognition, depth camera was introduced and obtained slightly better accuracy, however, more careful algorithm tuning is necessary.
Free Research Field	音声信号処理
Academic Significance and Societal Importance of the Research Achievements	喉頭摘出者など病気や事故で発声が困難になった場合、電気式人工喉頭や食道発声等の代用音声を用いる．しかしこれらは使用時に目立つことや習得に時間がかかることが課題である．実際にユーザからは“既存のデバイスが使える”，“目立たない外観である”，“使いやすいインターフェースである”ことが望まれている．このことから機械読唇による発声支援が研究されている．本研究の特徴は口形素と変分オートエンコーダを用いて単語登録が極めて容易な機械読唇によるフレーズ認識方式であり、携帯端末への実装も行いその効果や課題を検証した．また、深度画像を用いて機械読唇での子音認識の精度向上を図っており、実証実験に向けて意義は大きい．