• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Investigation of method optimization for multi-modal speech recognition

Research Project

Project/Area Number 25730109
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Perceptual information processing
Research InstitutionGifu University

Principal Investigator

Tamura Satoshi  岐阜大学, 工学部, 助教 (10402215)

Project Period (FY) 2013-04-01 – 2016-03-31
Project Status Completed (Fiscal Year 2015)
Budget Amount *help
¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2015: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2014: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2013: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Keywords音声認識 / マルチモーダル情報処理 / 読唇 / 最適化 / 実環境
Outline of Final Research Achievements

For multi-modal speech recognition that uses speech signals and lip images, this research aimed at development of method optimization according to tasks and environments. Effectiveness of incorporating several basic features and applying deep-learning techniques, the optimal architecture of audio-visual integration in addition to effectiveness of stochastic model combination, and improvement of model adaptation were clarified. A robust and high-performance multi-modal speech recognition method was thus developed. The method was applied in various tasks and environments, then recognition improvement was observed and future works were also found.

Report

(4 results)
  • 2015 Annual Research Report   Final Research Report ( PDF )
  • 2014 Research-status Report
  • 2013 Research-status Report
  • Research Products

    (16 results)

All 2016 2015 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (15 results) (of which Int'l Joint Research: 6 results,  Invited: 1 results)

  • [Journal Article] Multistream sparse representation features for noise robust audio-visual speech recognition2014

    • Author(s)
      Peng Shen, Satoshi Tamura, Satoru Hayamizu
    • Journal Title

      Acoustical Science and Technology

      Volume: 35 Issue: 1 Pages: 17-27

    • DOI

      10.1250/ast.35.17

    • NAID

      130003381833

    • ISSN
      0369-4232, 1346-3969, 1347-5177
    • Related Report
      2013 Research-status Report
    • Peer Reviewed
  • [Presentation] Visual speech recognition using optical and depth image features2016

    • Author(s)
      Satoshi Tamura, Takuya Kawasaki, Koichi Miyazaki, Kazuto Ukai and Satoru Hayamizu
    • Organizer
      FCV2016
    • Place of Presentation
      Takayama, Japan
    • Year and Date
      2016-02-17
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Audio-visual speech recognition using deep bottleneck features and high-performance lipreading2015

    • Author(s)
      Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda and Satoru Hayamizu
    • Organizer
      APSIPA ASC 2015
    • Place of Presentation
      Hong Kong, China
    • Year and Date
      2015-12-16
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 深層学習によるボトルネック特徴量を用いたマルチモーダル音声認識2015

    • Author(s)
      田村 哲嗣, 二宮 宏史, 北岡 教英, 大須賀 晋, 入部 百合絵, 武田 一哉, 速水 悟
    • Organizer
      電子情報通信学会 技術研究報告
    • Place of Presentation
      神戸大学
    • Year and Date
      2015-10-15
    • Related Report
      2015 Annual Research Report
  • [Presentation] Audio-visual processing toward robust speech recognition in cars2015

    • Author(s)
      Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda and Satoru Hayamizu
    • Organizer
      DSP in Vehicle 2015
    • Place of Presentation
      San Francisco, U.S.A.
    • Year and Date
      2015-10-14
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Investigation of DNN-based modeling for audio-visual speech recognition2015

    • Author(s)
      Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda and Satoru Hayamizu
    • Organizer
      MLSLP2015
    • Place of Presentation
      Aizu, Japan
    • Year and Date
      2015-09-19
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 深層学習による音響・画像特徴量を用いたマルチモーダル音声認識2015

    • Author(s)
      田村 哲嗣, 二宮 宏史, 北岡 教英, 大須賀 晋, 入部 百合絵, 武田 一哉, 速水 悟
    • Organizer
      日本音響学会 2015年秋季研究発表会
    • Place of Presentation
      会津大学
    • Year and Date
      2015-09-16
    • Related Report
      2015 Annual Research Report
  • [Presentation] Stream weight estimation using higher order statistics in multi-modal speech recognition2015

    • Author(s)
      Kazuto Ukai, Satoshi Tamura and Satoru Hayamizu
    • Organizer
      FAAVSP2015
    • Place of Presentation
      Vienna, Austria
    • Year and Date
      2015-09-11
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Integration of deep bottleneck features for audio-visual speech recognition2015

    • Author(s)
      Hiroshi Ninomiya, Norihide Kitaoka, Satoshi Tamura, Yurie Iribe and Kazuya Takeda
    • Organizer
      INTERSPEECH2015
    • Place of Presentation
      Dresden, Germany
    • Year and Date
      2015-09-06
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Data collection for mobile audio-visual speech recognition in various environments2014

    • Author(s)
      Satoshi Tamura, Takumi Seko and Satoru Hayamizu,
    • Organizer
      国際会議 Oriental COCOSDA 2014
    • Place of Presentation
      Phuket, Thailand
    • Year and Date
      2014-09-11
    • Related Report
      2014 Research-status Report
  • [Presentation] Speaking-face detection for multimodal person recognition in TV shows2014

    • Author(s)
      Satoshi Tamura and Herve Bredin
    • Organizer
      日本音響学会 2014年秋季研究発表会
    • Place of Presentation
      北海学園大学
    • Year and Date
      2014-09-05
    • Related Report
      2014 Research-status Report
  • [Presentation] マルチモーダル音声認識における音声と画像の協調によるモデル適応法の検討2014

    • Author(s)
      絹田卓也, 田村哲嗣, 速水悟
    • Organizer
      第13回情報科学技術フォーラム(FIT2014)
    • Place of Presentation
      筑波大学
    • Year and Date
      2014-09-05
    • Related Report
      2014 Research-status Report
  • [Presentation] 実環境におけるマルチモーダル音声インターフェースの適用2014

    • Author(s)
      世古拓海, 河﨑卓也, 田村哲嗣, 速水悟
    • Organizer
      電子情報通信学会技術研究報告(パターン認識・メディア理解研究会)
    • Place of Presentation
      早稲田大学
    • Related Report
      2013 Research-status Report
  • [Presentation] マルチモーダル情報処理技術を用いた音声・画像の統合的活用2014

    • Author(s)
      田村哲嗣
    • Organizer
      第1回サイレント音声認識グループ講演会
    • Place of Presentation
      九州工業大学
    • Related Report
      2013 Research-status Report
    • Invited
  • [Presentation] Improvement of lipreading performance using discriminative feature and speaker adaptation2013

    • Author(s)
      Takumi Seko, Naoya Ukai, Satoshi Tamura and Satoru Hayamizu
    • Organizer
      国際会議AVSP2013
    • Place of Presentation
      Annecy, France
    • Related Report
      2013 Research-status Report
  • [Presentation] Improvement of lip reading performance in real environments using speaker and environmental adaptation2013

    • Author(s)
      Takuya Kawasaki, Naoya Ukai, Takumi Seko, Satoshi Tamura and Satoru Hayamizu
    • Organizer
      国際会議ACPR2013
    • Place of Presentation
      Okinawa, Japan
    • Related Report
      2013 Research-status Report

URL: 

Published: 2014-07-25   Modified: 2019-07-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi