• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Multimodal time-sequence data recognition platform based on deep learning

Research Project

Project/Area Number 16H02845
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Perceptual information processing
Research InstitutionTokyo Institute of Technology

Principal Investigator

Shinoda Koichi  東京工業大学, 情報理工学院, 教授 (10343097)

Co-Investigator(Kenkyū-buntansha) 井上 中順  東京工業大学, 情報理工学院, 助教 (10733397)
岩野 公司  東京都市大学, メディア情報学部, 教授 (90323823)
Project Period (FY) 2016-04-01 – 2019-03-31
Project Status Completed (Fiscal Year 2018)
Budget Amount *help
¥15,990,000 (Direct Cost: ¥12,300,000、Indirect Cost: ¥3,690,000)
Fiscal Year 2018: ¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2017: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000)
Fiscal Year 2016: ¥5,980,000 (Direct Cost: ¥4,600,000、Indirect Cost: ¥1,380,000)
Keywords知覚情報処理 / 音声情報処理 / 動画情報処理 / 深層学習
Outline of Final Research Achievements

This research aims to accurately recognize multi-modal time-sequence signals using deep learning. We applied various deep learning techniques such as End-to-end training, deep net which is trainable with a small amount of data, multi-task learning, and noise-robust recognition. Particularly, we improved the recognition and detection performance in simultaneous training for source separation and speech recognition, dementia detection from speech, multi-modal speech recognition using lip reading, noise-robust speech recognition.

Academic Significance and Societal Importance of the Research Achievements

深層学習はこの十年ほど画像認識や音声認識の標準的な技術となった。しかしながら、人間のもつ事前知識の活用、周囲環境の違いや話者の違いなどによる性能の劣化、学習のための大量のデータが得られない応用への適用、などの点においてまだ課題が多い。本研究では、これらの問題を解決する鍵となる、End-to-End学習、少ないデータからの効率的なモデル学習、マルチタスク学習、耐ノイズ認識の方式を提案し、一定の成果を得ることができた。これらの成果は実社会における様々な問題に対して容易に適用可能である。

Report

(4 results)
  • 2018 Annual Research Report   Final Research Report ( PDF )
  • 2017 Annual Research Report
  • 2016 Annual Research Report
  • Research Products

    (41 results)

All 2019 2018 2017 2016

All Journal Article (3 results) (of which Peer Reviewed: 3 results,  Open Access: 2 results,  Acknowledgement Compliant: 1 results) Presentation (37 results) (of which Int'l Joint Research: 17 results,  Invited: 9 results) Book (1 results)

  • [Journal Article] 音声言語処理における深層学習:総説2017

    • Author(s)
      篠田 浩一
    • Journal Title

      日本音響学会誌

      Volume: 73 Pages: 25-30

    • NAID

      130007355576

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] [Invited Paper] Semantic Indexing for Large-Scale Video Retrieval2016

    • Author(s)
      Nakamasa Inoue, Koichi Shinoda
    • Journal Title

      ITE Transactions on Media Technology and Applications

      Volume: 4 Issue: 3 Pages: 209-217

    • DOI

      10.3169/mta.4.209

    • NAID

      130005161897

    • ISSN
      2186-7364
    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Wise Teachers Train Better DNN Acoustic Models2016

    • Author(s)
      R. Price, K. Iso, K. Shinoda
    • Journal Title

      EURASIP Journal on Audio Speech and Music Processing

      Volume: 2016 Issue: 1 Pages: 1-19

    • DOI

      10.1186/s13636-016-0088-7

    • NAID

      120006582513

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Open Access
  • [Presentation] 情報理工学の現状と将来2019

    • Author(s)
      篠田 浩一
    • Organizer
      第40回蔵前科学技術セミナー
    • Related Report
      2018 Annual Research Report
    • Invited
  • [Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2019

    • Author(s)
      Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      情報処理学会研究報告 SLP
    • Related Report
      2018 Annual Research Report
  • [Presentation] A robust algorithm of phase recovery for speech enhancement2019

    • Author(s)
      Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
    • Organizer
      電子情報通信学会技術研究報告 SP
    • Related Report
      2018 Annual Research Report
  • [Presentation] Improving the robustness of multiple input spectrogram inversion2019

    • Author(s)
      Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
    • Organizer
      日本音響学会2019年春季研究発表会講演論文集
    • Related Report
      2018 Annual Research Report
  • [Presentation] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION2019

    • Author(s)
      Raden Mu’az Mun’im, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ICASSP2019
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 深層学習のためのCo-Design2018

    • Author(s)
      篠田 浩一
    • Organizer
      電子情報通信学会技術研究報告 SP/PRMU
    • Related Report
      2018 Annual Research Report
    • Invited
  • [Presentation] 単語分散表現を用いた動画からのイベント検出2018

    • Author(s)
      金井怜, 井上 中順, 李時旭, 篠田 浩一
    • Organizer
      第21回 画像の認識・理解シンポジウム (MIRU)
    • Related Report
      2018 Annual Research Report
  • [Presentation] Astronomical Image Subtraction for Transient Detection Using CNN2018

    • Author(s)
      Yan Long, Nakamasa Inoue, Koichi Shinoda, Yoichi Yatsu, Ryosuke Itoh, Nobuyuki Kawai
    • Organizer
      The 21st Meeting on Image Recognition and Understanding (MIRU)
    • Related Report
      2018 Annual Research Report
  • [Presentation] Alzheimer's Disease Prediction Using Audio Gated Convolutional Neural Network2018

    • Author(s)
      Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ASJ 2018 Autumn Meeting
    • Related Report
      2018 Annual Research Report
  • [Presentation] Generative Adversarial Network Based i-Vector Transformation for Short Utterance Speaker Verification2018

    • Author(s)
      Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ASJ 2018 Autumn Meeting
    • Related Report
      2018 Annual Research Report
  • [Presentation] A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition2018

    • Author(s)
      Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      British Machine Vision Conference (BMVC)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2018

    • Author(s)
      Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      Interspeech
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification2018

    • Author(s)
      Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      Interspeech
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Few-Shot Adaptation for Multimedia Semantic Indexing2018

    • Author(s)
      Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ACM Multimedia
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] VANT at TRECVID 20182018

    • Author(s)
      Nakamasa Inoue, Chihiro Shiraishi, Aleksandr Drozd, Koichi Shinoda, Shi-wook Lee, Alex Chichung Kot
    • Organizer
      TRECVID workshop
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Skeleton-based Human Action Recognition with Fine-to-Coarse Convolutional Neural Network2018

    • Author(s)
      Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      Technical Reports of IEICE PRMU
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] The NEC-TT Speaker Verification System for SRE’182018

    • Author(s)
      K. A. Lee, H. Yamamoto, K. Okabe, Q. Wang, L. Guo, T. Koshinaka, J. Zhang, K. Shinoda
    • Organizer
      NIST 2018 Speaker Recognition Evaluation
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 全層ゲート付き2次元畳み込みネットワークによる多重音信号の音高認識2018

    • Author(s)
      生田目 敬弘, 亀岡 弘和, 篠田 浩一
    • Organizer
      研究報告音声言語情報処理(SLP)
    • Related Report
      2017 Annual Research Report
  • [Presentation] Multi-Task Autoencoder for Noise-Robust Speech Recognition2018

    • Author(s)
      Haoyi Zhang, Conggui Liu, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ICASSP
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Speaker Separation in Multi-Channel Environment Using Deep Learning2017

    • Author(s)
      Conggui Liu, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      情報処理学会音声言語情報処理研究会
    • Place of Presentation
      琴平グランドホテル桜の抄, 香川県琴平町
    • Year and Date
      2017-02-17
    • Related Report
      2016 Annual Research Report
  • [Presentation] Video Information Retrieval2017

    • Author(s)
      Koichi Shinoda
    • Organizer
      The 2017 IEEE SPS Summer School on Visual Image Search and Visual Analytics (VISVA2017)
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] 口唇の深度画像を用いたディープオートエンコーダによるマルチモーダル音声認識2017

    • Author(s)
      安井勇樹, 岩野 公司, 井上 中順, 篠田 浩一
    • Organizer
      情報処理学会研究報告 SLP
    • Related Report
      2017 Annual Research Report
  • [Presentation] Joint training of speaker separation and speech recognit ion based on deep learning2017

    • Author(s)
      Conggui Liu, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ASJ 2017 Autumn Meeting
    • Related Report
      2017 Annual Research Report
  • [Presentation] 口唇深度画像を利用したディープオートエンコーダに基づくマルチモーダル音声認識2017

    • Author(s)
      安井 勇樹, 岩野 公司, 井上 中順, 篠田 浩一
    • Organizer
      日本音響学会2017年秋季研究発表会講演論文集
    • Related Report
      2017 Annual Research Report
  • [Presentation] 深層学習の音声認識への応用2017

    • Author(s)
      篠田 浩一
    • Organizer
      情報処理学会連続セミナー2017 第4回ディープラーニングの活用と基盤
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos2017

    • Author(s)
      Mengxi Lin, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ACM Multimedia Thematic Workshop
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] TokyoTech-AIST at TRECVID 2017: Multimedia Event Detection Using Deep CNNs and Zero-Shot Classifiers2017

    • Author(s)
      Nakamasa Inoue, Ryosuke Yamamoto, Na Rong, Satoshi Kanai, Junsuke Masada, Chihiro Shiraishi, Shi-wook Lee, Koichi Shinoda
    • Organizer
      TRECVID workshop
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Multimodal Speech Recognition Using Mouth Images from Depth Camera2017

    • Author(s)
      Yuki Yasui, Nakamasa Inoue, Koji Iwano, Koichi Shinoda
    • Organizer
      APSIPA
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings2017

    • Author(s)
      Conggui Liu, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      APSIPA
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 高速かつ省資源な深層学習の実現に向けて2017

    • Author(s)
      篠田 浩一
    • Organizer
      JST・NSF国際連携シンポジウム
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] Action Sequence Recognition in Videos by Combining a CTC Network with a Statistical Language Model2017

    • Author(s)
      Mengxi Lin, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      Technical Reports of IEICE PRMU
    • Related Report
      2017 Annual Research Report
  • [Presentation] Video Semantic Indexing and Localization2016

    • Author(s)
      Koichi Shinoda
    • Organizer
      5th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan
    • Place of Presentation
      Hilton Hawaiian Village, Honolulu, USA
    • Year and Date
      2016-11-28
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] TokyoTech at TRECVID 20162016

    • Author(s)
      Nakamasa Inoue, Ryosuke Yamamoto, Na Rong, Koichi Shinoda
    • Organizer
      NIST TRECVID workshop
    • Place of Presentation
      NIST, Gaithersburg, MA, USA
    • Year and Date
      2016-11-14
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Adaptation of Word Vectors using Tree Structure for Visual Semantics2016

    • Author(s)
      Nakamasa Inoue, Koichi Shinoda
    • Organizer
      ACM Multimedia 2016
    • Place of Presentation
      Theater Tuschinski, アムステルダム
    • Year and Date
      2016-10-15
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Concept Elimination for Zero-Shot Event Detection2016

    • Author(s)
      Tran Hai Dang, Nakamasa Inoue, Koichi Shinoda
    • Organizer
      The 22nd Symposium on Sensing via Image Information (SSII)
    • Place of Presentation
      パシフィコ横浜アネックス, 横浜市
    • Year and Date
      2016-06-08
    • Related Report
      2016 Annual Research Report
  • [Presentation] Deep Learning for Speech, Image, and Video2016

    • Author(s)
      Koichi Shinoda
    • Organizer
      International Conference on Computer, Control, Informatics, and Its Applications (IC3INA)
    • Place of Presentation
      Indonesia Convention Exhibition (ICE), Tangerang, Indonesia
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] 東工大TSUBAMEの活用事例:マルチメディア認識のための深層学習2016

    • Author(s)
      篠田 浩一
    • Organizer
      GTC Japan 2016
    • Place of Presentation
      ヒルトン東京お台場, 東京都港区
    • Related Report
      2016 Annual Research Report
    • Invited
  • [Book] 音声認識 (機械学習プロフェッショナルシリーズ)2017

    • Author(s)
      篠田 浩一
    • Total Pages
      165
    • Publisher
      講談社
    • ISBN
      9784061529274
    • Related Report
      2017 Annual Research Report

URL: 

Published: 2016-04-21   Modified: 2020-03-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi