Estimation of speech content from vocal movements by fusion of multiple sensors and its application to speech assistance devices

Research Project

Project/Area Number	21K11941
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Nippon Institute of Technology
Principal Investigator	Ota Kenko 日本工業大学, 基幹工学部, 助教 (50511911)
Project Period (FY)	2021-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2023: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2022: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2021: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
Keywords	無発声音声認識 / 深層学習 / 三次元計測 / 音素 / 音声認識 / 機械学習 / 生体情報 / センサ技術
Outline of Research at the Start	本研究は、音声を用いることなく発話内容を認識するための技術を確立し、それを実用化することを目標としている。そのためには口の形や動きなどを定量化するためのセンサ技術を明らかにし、より実用的な認識技術とするために、ヒトの自然な発話動作や発話中の身体の動作に対しても頑健な特徴量を明らかにする。本研究の成果は，喉頭がんなどの原因により後天的に発声が困難になった人々に対して発声を補助するデバイスの開発や、雑音・残響がひどい環境での音声認識、会話におけるプライバシー保護、さらには防犯など幅広い分野への応用が期待できる。
Outline of Final Research Achievements	Throughout the entire research period, the purpose of this study was to investigate systems that assist people who have difficulty speaking, such as by removing their vocal cords, and systems that assist existing speech recognition. As a result, we were able to study a technology using deep learning that recognizes sentences phoneme by phoneme without using speech information. We also studied a technology for estimating emotions using a camera and a sensor that measures the galvanic skin response of the fingers, and a speech synthesizing technology from text as a method for assisting speech production.
Academic Significance and Societal Importance of the Research Achievements	本研究は、音声情報を利用しない音声認識について、深層学習を用いた音素単位での文章認識を実現するためのデータ取得手法や深層ニューラルネットワークについて検討したことに学術的な意義がある。また、話者の感情推定や音声合成技術それぞれについて取り組み、発声が困難な方のための発声補助デバイスの開発に向けた基礎的な検討ができたことや課題の抽出ができたことに社会的な意義がある。

Report

(4 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Research-status Report
2021 Research-status Report

Research Products
(5 results)

All 2024 2023 2022

All Presentation (5 results) (of which Int'l Joint Research: 1 results)

[Presentation] 口唇特徴点の時系列データに基づいた日本語機械読唇手法の検討2024
- Author(s)
  大田健紘、久保　茜、倉島　廉
- Organizer
  電子情報通信学会
- Related Report
  2023 Annual Research Report
[Presentation] Silent speech recognition using data augmentation based on a 3D lip model2023
- Author(s)
  Kenko Ota
- Organizer
  Acoustical society of America
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] 機械読唇における三次元モデルを用いたデータ拡張が認識精度に与える影響2023
- Author(s)
  木村一馬, 大田健紘
- Organizer
  電子情報通信学会ヘルスケア・医療情報通信技術研究会
- Related Report
  2022 Research-status Report
[Presentation] 深層学習に顔の3次元モデルを用いた無発声単語認識に関する研究2022
- Author(s)
  和田竜二, 大田健紘
- Organizer
  電子情報通信学会ヘルスケア・医療情報通信技術研究会
- Related Report
  2021 Research-status Report
[Presentation] 複数のセンサを用いる無発声単語認識に関する研究2022
- Author(s)
  草本雅也, 大田健紘
- Organizer
  電子情報通信学会ヘルスケア・医療情報通信技術研究会
- Related Report
  2021 Research-status Report

Estimation of speech content from vocal movements by fusion of multiple sensors and its application to speech assistance devices

Principal Investigator

Ota Kenko 日本工業大学, 基幹工学部, 助教 (50511911)

¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)

Report

Research Products

[Presentation] 口唇特徴点の時系列データに基づいた日本語機械読唇手法の検討2024

Author(s)

Organizer

Related Report

[Presentation] Silent speech recognition using data augmentation based on a 3D lip model2023

Author(s)

Organizer

Related Report

[Presentation] 機械読唇における三次元モデルを用いたデータ拡張が認識精度に与える影響2023

Author(s)

Organizer

Related Report

[Presentation] 深層学習に顔の3次元モデルを用いた無発声単語認識に関する研究2022

Author(s)

Organizer

Related Report

[Presentation] 複数のセンサを用いる無発声単語認識に関する研究2022

Author(s)

Organizer

Related Report