Multi-Modal Speech Enhancement Using Mobile Device

Research Project

Project/Area Number	19K12905
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 90150:Medical assistive technology-related
Research Institution	Osaka Institute of Technology
Principal Investigator	MATSUI Kenji 大阪工業大学, ロボティクス&デザイン工学部, 教授 (30613682)
Co-Investigator(Kenkyū-buntansha)	中藤良久九州工業大学, 大学院工学研究院, 教授 (10599955) 加藤弓子聖マリアンナ医科大学, 医学部, 研究員 (10600463) 水町光徳九州工業大学, 大学院工学研究院, 准教授 (90380740)
Project Period (FY)	2019-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000) Fiscal Year 2021: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000) Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2019: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Keywords	機械読唇 / 発声支援 / 変分オートエンコーダー / 口形素 / 深度画像 / 携帯端末 / 人工喉頭 / 口唇画像認識 / モバイル端末 / 子音認識 / 音声合成
Outline of Research at the Start	本研究の目的は、見た目に健常者と同等で周囲の視線を気にすることなく使える拡声装置あるいは発声支援装置の開発である。さらに、低コスト、軽量で簡単に使え、ユーザー適応可能な装置の開発を目指す。スマホおよび口唇画像情報を最大限利用し、今までの人工喉頭とは全く異なり、スマホにアプリをインストールするだけで利用可能なシステムを構築する。具体的には、主に食道発声ユーザーに向けた音声強調および小型拡声器を用いたワイヤレス拡声機能、人工喉頭を主として用いるユーザーに向けた読唇⇒音声合成による発声機能をそれぞれ実現し、多様な特性に適応可能なスマホ利用発声支援機能群を実現することを目標とする。
Outline of Final Research Achievements	We have been developing a speech enhancement device for laryngectomees. Our approach is to use a lip-reading technology to be able to recognize Japanese words from lip images and generate speech outputs using mobile devices. The target words are translated into registered 36 viseme sequences, and converted into VAE (Variational Auto Encoder) feature parameters. Then the corresponding words are recognized using CNN-based model. PC-based prototype was tested, and observed more than 90% accuracy with 20 Japanese words and a well-trained single subject. Also, we developed a mobile device based prototype and conducted the preliminary recognition experiment with 26 words by a well-trained single subject, and 95% accuracy was obtained including the 1st through 6th candidates, which was almost equivalent to the PC-based system. To be able to improve consonant recognition, depth camera was introduced and obtained slightly better accuracy, however, more careful algorithm tuning is necessary.
Academic Significance and Societal Importance of the Research Achievements	喉頭摘出者など病気や事故で発声が困難になった場合、電気式人工喉頭や食道発声等の代用音声を用いる．しかしこれらは使用時に目立つことや習得に時間がかかることが課題である．実際にユーザからは“既存のデバイスが使える”，“目立たない外観である”，“使いやすいインターフェースである”ことが望まれている．このことから機械読唇による発声支援が研究されている．本研究の特徴は口形素と変分オートエンコーダを用いて単語登録が極めて容易な機械読唇によるフレーズ認識方式であり、携帯端末への実装も行いその効果や課題を検証した．また、深度画像を用いて機械読唇での子音認識の精度向上を図っており、実証実験に向けて意義は大きい．

Report

(4 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report
2019 Research-status Report

Research Products
(12 results)

All 2022 2021 2020 2019

All Journal Article (2 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 2 results) Presentation (10 results) (of which Int'l Joint Research: 5 results)

[Journal Article] Development of Mobile Device-Based Speech Enhancement System Using Lip-Reading2022
- Author(s)
  Fumiaki Eguchi, Kenji Matsui, Yoshihisa Nakatoh, Yumiko O. Kato, Alberto Rivas, Juan Manuel Corchado
- Journal Title
  
  Distributed Computing and Artificial Intelligence, Volume 1: 18th International Conference
  
  Volume: 1 Pages: 210-220
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Mobile Device-based Speech Enhancement System Using Lip-reading2020
- Author(s)
  Tomonori Nakahara, Kohei Fukuyama, Mitsuru Hamada, Kenji Matsui, Yoshihisa Nakatoh, Yumiko O. Kato, Alberto Rivas, Juan Manuel Corchado
- Journal Title
  
  Advances in Intelligent Systems and Computing
  
  Volume: 1237 Pages: 159-167
- DOI
  10.1007/978-3-030-53036-5_17
- ISBN
  9783030530358, 9783030530365
- Related Report
  2020 Research-status Report 2019 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Presentation] 携帯端末を用いた口唇認識による発話支援の検討2021
- Author(s)
  江口文耀、松井謙二、中藤良久、加藤弓子
- Organizer
  日本音響学会2021年秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] Effective Selection Method of Microphones for Conversation Assistance in Noisy Environment2021
- Author(s)
  Mizuki Horii, Rin Hirakawa, Hideaki Kawano, Yoshihisa Nakatoh
- Organizer
  5th International Conference on Human Interaction and Emerging Technologies
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Speaker identification method using bone conduction and throat microphones2021
- Author(s)
  Takeshi Hashiguchi, Rin Hirakawa, Hideaki Kawano, Yoshihisa Nakatoh,
- Organizer
  5th International Conference on Human Interaction and Emerging Technologies
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Speech Enhancement System Using SVM for Train Announcement2021
- Author(s)
  Yuto Kinoshita, Rin Hirakawa, Hideaki Kawano, Kenichi Nakashi, Yoshihisa Nakatoh
- Organizer
  The 39th IEEE International Conference on Consumer Electronics（IEEE ICCE 2021）
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Speech Enhancement System Using Lip-reading2020
- Author(s)
  Kenji Matsui, Kohei Fukuyama, Yoshihisa Nakatoh, Yumiko O. Kato
- Organizer
  2nd IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] 発声支援のための口形素列によるフレーズ認識方式の検討2020
- Author(s)
  中原智典, 福山晃平, 松井謙二, 中藤良久, 加藤弓子
- Organizer
  日本音響学会2020年秋季研究発表会
- Related Report
  2020 Research-status Report
[Presentation] Mobile Device-based Speech Enhancement System Using Lip-reading2020
- Author(s)
  Tomonori Nakahara, Kohei Fukuyama, Mitsuru Hamada, Kenji Matsui, Yoshihisa Nakatoh, Yumiko O. Kato, Alberto Rivas, Juan Manuel Corchado
- Organizer
  17th International Conference on Distributed Computing and Artificial Intelligence
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] 携帯機器を用いた口唇情報利用発声支援デバイスの開発2020
- Author(s)
  濵田三弦，福山晃平，松井謙二，中藤良久，加藤弓子
- Organizer
  日本音響学会2020年春季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] 発声支援のための読唇手法の検討2020
- Author(s)
  福山晃平，松井謙二，中藤良久，加藤弓子
- Organizer
  日本音響学会2020年春季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] 携帯機器と口唇情報利用による発声支援方式の検討2019
- Author(s)
  福山晃平，濵田三弦，松井謙二，中藤良久，加藤弓子
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Research-status Report

Multi-Modal Speech Enhancement Using Mobile Device

Principal Investigator

MATSUI Kenji 大阪工業大学, ロボティクス&デザイン工学部, 教授 (30613682)

¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)

Report

Research Products

[Journal Article] Development of Mobile Device-Based Speech Enhancement System Using Lip-Reading2022

Author(s)

Journal Title

Related Report

[Journal Article] Mobile Device-based Speech Enhancement System Using Lip-reading2020

Author(s)

Journal Title

DOI

ISBN

Related Report

[Presentation] 携帯端末を用いた口唇認識による発話支援の検討2021

Author(s)

Organizer

Related Report

[Presentation] Effective Selection Method of Microphones for Conversation Assistance in Noisy Environment2021

Author(s)

Organizer

Related Report

[Presentation] Speaker identification method using bone conduction and throat microphones2021

Author(s)

Organizer

Related Report

[Presentation] Speech Enhancement System Using SVM for Train Announcement2021

Author(s)

Organizer

Related Report

[Presentation] Speech Enhancement System Using Lip-reading2020

Author(s)

Organizer

Related Report

[Presentation] 発声支援のための口形素列によるフレーズ認識方式の検討2020

Author(s)

Organizer

Related Report

[Presentation] Mobile Device-based Speech Enhancement System Using Lip-reading2020

Author(s)

Organizer

Related Report

[Presentation] 携帯機器 を用いた口唇情報利用 発声支援デバイスの開発2020

Author(s)

Organizer

Related Report

[Presentation] 発声支援のための読唇手法の検討2020

Author(s)

Organizer

Related Report

[Presentation] 携帯機器と口唇情報利用による発声支援方式の検討2019

Author(s)

Organizer

Related Report

[Presentation] 携帯機器を用いた口唇情報利用発声支援デバイスの開発2020