Investigation of method optimization for multi-modal speech recognition

Research Project

Project/Area Number	25730109
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	Gifu University
Principal Investigator	Tamura Satoshi 岐阜大学, 工学部, 助教 (10402215)
Project Period (FY)	2013-04-01 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000) Fiscal Year 2015: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2014: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2013: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Keywords	音声認識 / マルチモーダル情報処理 / 読唇 / 最適化 / 実環境
Outline of Final Research Achievements	For multi-modal speech recognition that uses speech signals and lip images, this research aimed at development of method optimization according to tasks and environments. Effectiveness of incorporating several basic features and applying deep-learning techniques, the optimal architecture of audio-visual integration in addition to effectiveness of stochastic model combination, and improvement of model adaptation were clarified. A robust and high-performance multi-modal speech recognition method was thus developed. The method was applied in various tasks and environments, then recognition improvement was observed and future works were also found.

Report

(4 results)

2015 Annual Research Report Final Research Report ( PDF )
2014 Research-status Report
2013 Research-status Report

Research Products
(16 results)

All 2016 2015 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (15 results) (of which Int'l Joint Research: 6 results, Invited: 1 results)

[Journal Article] Multistream sparse representation features for noise robust audio-visual speech recognition2014
- Author(s)
  Peng Shen, Satoshi Tamura, Satoru Hayamizu
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 35 Issue: 1 Pages: 17-27
- DOI
  10.1250/ast.35.17
- NAID
  130003381833
- ISSN
  0369-4232, 1346-3969, 1347-5177
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Presentation] Visual speech recognition using optical and depth image features2016
- Author(s)
  Satoshi Tamura, Takuya Kawasaki, Koichi Miyazaki, Kazuto Ukai and Satoru Hayamizu
- Organizer
  FCV2016
- Place of Presentation
  Takayama, Japan
- Year and Date
  2016-02-17
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Audio-visual speech recognition using deep bottleneck features and high-performance lipreading2015
- Author(s)
  Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda and Satoru Hayamizu
- Organizer
  APSIPA ASC 2015
- Place of Presentation
  Hong Kong, China
- Year and Date
  2015-12-16
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] 深層学習によるボトルネック特徴量を用いたマルチモーダル音声認識2015
- Author(s)
  田村哲嗣, 二宮宏史, 北岡教英, 大須賀晋, 入部百合絵, 武田一哉, 速水悟
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  神戸大学
- Year and Date
  2015-10-15
- Related Report
  2015 Annual Research Report
[Presentation] Audio-visual processing toward robust speech recognition in cars2015
- Author(s)
  Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda and Satoru Hayamizu
- Organizer
  DSP in Vehicle 2015
- Place of Presentation
  San Francisco, U.S.A.
- Year and Date
  2015-10-14
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Investigation of DNN-based modeling for audio-visual speech recognition2015
- Author(s)
  Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda and Satoru Hayamizu
- Organizer
  MLSLP2015
- Place of Presentation
  Aizu, Japan
- Year and Date
  2015-09-19
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] 深層学習による音響・画像特徴量を用いたマルチモーダル音声認識2015
- Author(s)
  田村哲嗣, 二宮宏史, 北岡教英, 大須賀晋, 入部百合絵, 武田一哉, 速水悟
- Organizer
  日本音響学会 2015年秋季研究発表会
- Place of Presentation
  会津大学
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] Stream weight estimation using higher order statistics in multi-modal speech recognition2015
- Author(s)
  Kazuto Ukai, Satoshi Tamura and Satoru Hayamizu
- Organizer
  FAAVSP2015
- Place of Presentation
  Vienna, Austria
- Year and Date
  2015-09-11
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Integration of deep bottleneck features for audio-visual speech recognition2015
- Author(s)
  Hiroshi Ninomiya, Norihide Kitaoka, Satoshi Tamura, Yurie Iribe and Kazuya Takeda
- Organizer
  INTERSPEECH2015
- Place of Presentation
  Dresden, Germany
- Year and Date
  2015-09-06
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Data collection for mobile audio-visual speech recognition in various environments2014
- Author(s)
  Satoshi Tamura, Takumi Seko and Satoru Hayamizu,
- Organizer
  国際会議 Oriental COCOSDA 2014
- Place of Presentation
  Phuket, Thailand
- Year and Date
  2014-09-11
- Related Report
  2014 Research-status Report
[Presentation] Speaking-face detection for multimodal person recognition in TV shows2014
- Author(s)
  Satoshi Tamura and Herve Bredin
- Organizer
  日本音響学会 2014年秋季研究発表会
- Place of Presentation
  北海学園大学
- Year and Date
  2014-09-05
- Related Report
  2014 Research-status Report
[Presentation] マルチモーダル音声認識における音声と画像の協調によるモデル適応法の検討2014
- Author(s)
  絹田卓也, 田村哲嗣, 速水悟
- Organizer
  第13回情報科学技術フォーラム（FIT2014）
- Place of Presentation
  筑波大学
- Year and Date
  2014-09-05
- Related Report
  2014 Research-status Report
[Presentation] 実環境におけるマルチモーダル音声インターフェースの適用2014
- Author(s)
  世古拓海, 河﨑卓也, 田村哲嗣, 速水悟
- Organizer
  電子情報通信学会技術研究報告（パターン認識・メディア理解研究会）
- Place of Presentation
  早稲田大学
- Related Report
  2013 Research-status Report
[Presentation] マルチモーダル情報処理技術を用いた音声・画像の統合的活用2014
- Author(s)
  田村哲嗣
- Organizer
  第1回サイレント音声認識グループ講演会
- Place of Presentation
  九州工業大学
- Related Report
  2013 Research-status Report
- Invited
[Presentation] Improvement of lipreading performance using discriminative feature and speaker adaptation2013
- Author(s)
  Takumi Seko, Naoya Ukai, Satoshi Tamura and Satoru Hayamizu
- Organizer
  国際会議AVSP2013
- Place of Presentation
  Annecy, France
- Related Report
  2013 Research-status Report
[Presentation] Improvement of lip reading performance in real environments using speaker and environmental adaptation2013
- Author(s)
  Takuya Kawasaki, Naoya Ukai, Takumi Seko, Satoshi Tamura and Satoru Hayamizu
- Organizer
  国際会議ACPR2013
- Place of Presentation
  Okinawa, Japan
- Related Report
  2013 Research-status Report

Investigation of method optimization for multi-modal speech recognition

Principal Investigator

Tamura Satoshi 岐阜大学, 工学部, 助教 (10402215)

¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)

Report

Research Products

[Journal Article] Multistream sparse representation features for noise robust audio-visual speech recognition2014

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] Visual speech recognition using optical and depth image features2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Audio-visual speech recognition using deep bottleneck features and high-performance lipreading2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 深層学習によるボトルネック特徴量を用いたマルチモーダル音声認識2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Audio-visual processing toward robust speech recognition in cars2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Investigation of DNN-based modeling for audio-visual speech recognition2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 深層学習による音響・画像特徴量を用いたマルチモーダル音声認識2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Stream weight estimation using higher order statistics in multi-modal speech recognition2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Integration of deep bottleneck features for audio-visual speech recognition2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Data collection for mobile audio-visual speech recognition in various environments2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Speaking-face detection for multimodal person recognition in TV shows2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] マルチモーダル音声認識における音声と画像の協調によるモデル適応法の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 実環境におけるマルチモーダル音声インターフェースの適用2014