Multimodal time-sequence data recognition platform based on deep learning

Research Project

Project/Area Number	16H02845
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perceptual information processing
Research Institution	Tokyo Institute of Technology
Principal Investigator	Shinoda Koichi 東京工業大学, 情報理工学院, 教授 (10343097)
Co-Investigator(Kenkyū-buntansha)	井上中順東京工業大学, 情報理工学院, 助教 (10733397) 岩野公司東京都市大学, メディア情報学部, 教授 (90323823)
Project Period (FY)	2016-04-01 – 2019-03-31
Project Status	Completed (Fiscal Year 2018)
Budget Amount *help	¥15,990,000 (Direct Cost: ¥12,300,000、Indirect Cost: ¥3,690,000) Fiscal Year 2018: ¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000) Fiscal Year 2017: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000) Fiscal Year 2016: ¥5,980,000 (Direct Cost: ¥4,600,000、Indirect Cost: ¥1,380,000)
Keywords	知覚情報処理 / 音声情報処理 / 動画情報処理 / 深層学習
Outline of Final Research Achievements	This research aims to accurately recognize multi-modal time-sequence signals using deep learning. We applied various deep learning techniques such as End-to-end training, deep net which is trainable with a small amount of data, multi-task learning, and noise-robust recognition. Particularly, we improved the recognition and detection performance in simultaneous training for source separation and speech recognition, dementia detection from speech, multi-modal speech recognition using lip reading, noise-robust speech recognition.
Academic Significance and Societal Importance of the Research Achievements	深層学習はこの十年ほど画像認識や音声認識の標準的な技術となった。しかしながら、人間のもつ事前知識の活用、周囲環境の違いや話者の違いなどによる性能の劣化、学習のための大量のデータが得られない応用への適用、などの点においてまだ課題が多い。本研究では、これらの問題を解決する鍵となる、End-to-End学習、少ないデータからの効率的なモデル学習、マルチタスク学習、耐ノイズ認識の方式を提案し、一定の成果を得ることができた。これらの成果は実社会における様々な問題に対して容易に適用可能である。

Report

(4 results)

2018 Annual Research Report Final Research Report ( PDF )
2017 Annual Research Report
2016 Annual Research Report

Research Products
(41 results)

All 2019 2018 2017 2016

All Journal Article (3 results) (of which Peer Reviewed: 3 results, Open Access: 2 results, Acknowledgement Compliant: 1 results) Presentation (37 results) (of which Int'l Joint Research: 17 results, Invited: 9 results) Book (1 results)

[Journal Article] 音声言語処理における深層学習：総説2017
- Author(s)
  篠田浩一
- Journal Title
  
  日本音響学会誌
  
  Volume: 73 Pages: 25-30
- NAID
  130007355576
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] [Invited Paper] Semantic Indexing for Large-Scale Video Retrieval2016
- Author(s)
  Nakamasa Inoue, Koichi Shinoda
- Journal Title
  
  ITE Transactions on Media Technology and Applications
  
  Volume: 4 Issue: 3 Pages: 209-217
- DOI
  10.3169/mta.4.209
- NAID
  130005161897
- ISSN
  2186-7364
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Wise Teachers Train Better DNN Acoustic Models2016
- Author(s)
  R. Price, K. Iso, K. Shinoda
- Journal Title
  
  EURASIP Journal on Audio Speech and Music Processing
  
  Volume: 2016 Issue: 1 Pages: 1-19
- DOI
  10.1186/s13636-016-0088-7
- NAID
  120006582513
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] 情報理工学の現状と将来2019
- Author(s)
  篠田浩一
- Organizer
  第40回蔵前科学技術セミナー
- Related Report
  2018 Annual Research Report
- Invited
[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2019
- Author(s)
  Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
- Organizer
  情報処理学会研究報告 SLP
- Related Report
  2018 Annual Research Report
[Presentation] A robust algorithm of phase recovery for speech enhancement2019
- Author(s)
  Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
- Organizer
  電子情報通信学会技術研究報告 SP
- Related Report
  2018 Annual Research Report
[Presentation] Improving the robustness of multiple input spectrogram inversion2019
- Author(s)
  Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
- Organizer
  日本音響学会2019年春季研究発表会講演論文集
- Related Report
  2018 Annual Research Report
[Presentation] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION2019
- Author(s)
  Raden Mu’az Mun’im, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ICASSP2019
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] 深層学習のためのCo-Design2018
- Author(s)
  篠田浩一
- Organizer
  電子情報通信学会技術研究報告 SP/PRMU
- Related Report
  2018 Annual Research Report
- Invited
[Presentation] 単語分散表現を用いた動画からのイベント検出2018
- Author(s)
  金井怜, 井上中順, 李時旭, 篠田浩一
- Organizer
  第21回画像の認識・理解シンポジウム (MIRU)
- Related Report
  2018 Annual Research Report
[Presentation] Astronomical Image Subtraction for Transient Detection Using CNN2018
- Author(s)
  Yan Long, Nakamasa Inoue, Koichi Shinoda, Yoichi Yatsu, Ryosuke Itoh, Nobuyuki Kawai
- Organizer
  The 21st Meeting on Image Recognition and Understanding (MIRU)
- Related Report
  2018 Annual Research Report
[Presentation] Alzheimer's Disease Prediction Using Audio Gated Convolutional Neural Network2018
- Author(s)
  Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ASJ 2018 Autumn Meeting
- Related Report
  2018 Annual Research Report
[Presentation] Generative Adversarial Network Based i-Vector Transformation for Short Utterance Speaker Verification2018
- Author(s)
  Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ASJ 2018 Autumn Meeting
- Related Report
  2018 Annual Research Report
[Presentation] A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition2018
- Author(s)
  Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
- Organizer
  British Machine Vision Conference (BMVC)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2018
- Author(s)
  Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
- Organizer
  Interspeech
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification2018
- Author(s)
  Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
- Organizer
  Interspeech
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Few-Shot Adaptation for Multimedia Semantic Indexing2018
- Author(s)
  Nakamasa Inoue, Koichi Shinoda
- Organizer
  ACM Multimedia
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] VANT at TRECVID 20182018
- Author(s)
  Nakamasa Inoue, Chihiro Shiraishi, Aleksandr Drozd, Koichi Shinoda, Shi-wook Lee, Alex Chichung Kot
- Organizer
  TRECVID workshop
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Skeleton-based Human Action Recognition with Fine-to-Coarse Convolutional Neural Network2018
- Author(s)
  Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
- Organizer
  Technical Reports of IEICE PRMU
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] The NEC-TT Speaker Verification System for SRE’182018
- Author(s)
  K. A. Lee, H. Yamamoto, K. Okabe, Q. Wang, L. Guo, T. Koshinaka, J. Zhang, K. Shinoda
- Organizer
  NIST 2018 Speaker Recognition Evaluation
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] 全層ゲート付き2次元畳み込みネットワークによる多重音信号の音高認識2018
- Author(s)
  生田目敬弘, 亀岡弘和, 篠田浩一
- Organizer
  研究報告音声言語情報処理（SLP）
- Related Report
  2017 Annual Research Report
[Presentation] Multi-Task Autoencoder for Noise-Robust Speech Recognition2018
- Author(s)
  Haoyi Zhang, Conggui Liu, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ICASSP
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Speaker Separation in Multi-Channel Environment Using Deep Learning2017
- Author(s)
  Conggui Liu, Nakamasa Inoue, Koichi Shinoda
- Organizer
  情報処理学会音声言語情報処理研究会
- Place of Presentation
  琴平グランドホテル桜の抄, 香川県琴平町
- Year and Date
  2017-02-17
- Related Report
  2016 Annual Research Report
[Presentation] Video Information Retrieval2017
- Author(s)
  Koichi Shinoda
- Organizer
  The 2017 IEEE SPS Summer School on Visual Image Search and Visual Analytics (VISVA2017)
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] 口唇の深度画像を用いたディープオートエンコーダによるマルチモーダル音声認識2017
- Author(s)
  安井勇樹, 岩野公司, 井上中順, 篠田浩一
- Organizer
  情報処理学会研究報告 SLP
- Related Report
  2017 Annual Research Report
[Presentation] Joint training of speaker separation and speech recognit ion based on deep learning2017
- Author(s)
  Conggui Liu, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ASJ 2017 Autumn Meeting
- Related Report
  2017 Annual Research Report
[Presentation] 口唇深度画像を利用したディープオートエンコーダに基づくマルチモーダル音声認識2017
- Author(s)
  安井勇樹, 岩野公司, 井上中順, 篠田浩一
- Organizer
  日本音響学会2017年秋季研究発表会講演論文集
- Related Report
  2017 Annual Research Report
[Presentation] 深層学習の音声認識への応用2017
- Author(s)
  篠田浩一
- Organizer
  情報処理学会連続セミナー2017 第4回ディープラーニングの活用と基盤
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos2017
- Author(s)
  Mengxi Lin, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ACM Multimedia Thematic Workshop
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] TokyoTech-AIST at TRECVID 2017: Multimedia Event Detection Using Deep CNNs and Zero-Shot Classifiers2017
- Author(s)
  Nakamasa Inoue, Ryosuke Yamamoto, Na Rong, Satoshi Kanai, Junsuke Masada, Chihiro Shiraishi, Shi-wook Lee, Koichi Shinoda
- Organizer
  TRECVID workshop
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Multimodal Speech Recognition Using Mouth Images from Depth Camera2017
- Author(s)
  Yuki Yasui, Nakamasa Inoue, Koji Iwano, Koichi Shinoda
- Organizer
  APSIPA
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] A Unified Network for Multi-Speaker Speech Recognition with Multi-Channel Recordings2017
- Author(s)
  Conggui Liu, Nakamasa Inoue, Koichi Shinoda
- Organizer
  APSIPA
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] 高速かつ省資源な深層学習の実現に向けて2017
- Author(s)
  篠田浩一
- Organizer
  JST・NSF国際連携シンポジウム
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] Action Sequence Recognition in Videos by Combining a CTC Network with a Statistical Language Model2017
- Author(s)
  Mengxi Lin, Nakamasa Inoue, Koichi Shinoda
- Organizer
  Technical Reports of IEICE PRMU
- Related Report
  2017 Annual Research Report
[Presentation] Video Semantic Indexing and Localization2016
- Author(s)
  Koichi Shinoda
- Organizer
  5th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan
- Place of Presentation
  Hilton Hawaiian Village, Honolulu, USA
- Year and Date
  2016-11-28
- Related Report
  2016 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] TokyoTech at TRECVID 20162016
- Author(s)
  Nakamasa Inoue, Ryosuke Yamamoto, Na Rong, Koichi Shinoda
- Organizer
  NIST TRECVID workshop
- Place of Presentation
  NIST, Gaithersburg, MA, USA
- Year and Date
  2016-11-14
- Related Report
  2016 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Adaptation of Word Vectors using Tree Structure for Visual Semantics2016
- Author(s)
  Nakamasa Inoue, Koichi Shinoda
- Organizer
  ACM Multimedia 2016
- Place of Presentation
  Theater Tuschinski, アムステルダム
- Year and Date
  2016-10-15
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Concept Elimination for Zero-Shot Event Detection2016
- Author(s)
  Tran Hai Dang, Nakamasa Inoue, Koichi Shinoda
- Organizer
  The 22nd Symposium on Sensing via Image Information (SSII)
- Place of Presentation
  パシフィコ横浜アネックス, 横浜市
- Year and Date
  2016-06-08
- Related Report
  2016 Annual Research Report
[Presentation] Deep Learning for Speech, Image, and Video2016
- Author(s)
  Koichi Shinoda
- Organizer
  International Conference on Computer, Control, Informatics, and Its Applications (IC3INA)
- Place of Presentation
  Indonesia Convention Exhibition (ICE), Tangerang, Indonesia
- Related Report
  2016 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] 東工大TSUBAMEの活用事例：マルチメディア認識のための深層学習2016
- Author(s)
  篠田浩一
- Organizer
  GTC Japan 2016
- Place of Presentation
  ヒルトン東京お台場, 東京都港区
- Related Report
  2016 Annual Research Report
- Invited
[Book] 音声認識 (機械学習プロフェッショナルシリーズ)2017
- Author(s)
  篠田浩一
- Total Pages
  165
- Publisher
  講談社
- ISBN
  9784061529274
- Related Report
  2017 Annual Research Report

Multimodal time-sequence data recognition platform based on deep learning

Principal Investigator

Shinoda Koichi 東京工業大学, 情報理工学院, 教授 (10343097)

¥15,990,000 (Direct Cost: ¥12,300,000、Indirect Cost: ¥3,690,000)

Report

Research Products

[Journal Article] 音声言語処理における深層学習：総説2017

Author(s)

Journal Title

NAID

Related Report

[Journal Article] [Invited Paper] Semantic Indexing for Large-Scale Video Retrieval2016

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] Wise Teachers Train Better DNN Acoustic Models2016

Author(s)

Journal Title

DOI

NAID

Related Report

[Presentation] 情報理工学の現状と将来2019

Author(s)

Organizer

Related Report

[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2019

Author(s)

Organizer

Related Report

[Presentation] A robust algorithm of phase recovery for speech enhancement2019

Author(s)

Organizer

Related Report

[Presentation] Improving the robustness of multiple input spectrogram inversion2019

Author(s)

Organizer

Related Report

[Presentation] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION2019

Author(s)

Organizer

Related Report

[Presentation] 深層学習のためのCo-Design2018

Author(s)

Organizer

Related Report

[Presentation] 単語分散表現を用いた動画からのイベント検出2018

Author(s)

Organizer

Related Report

[Presentation] Astronomical Image Subtraction for Transient Detection Using CNN2018

Author(s)

Organizer

Related Report

[Presentation] Alzheimer's Disease Prediction Using Audio Gated Convolutional Neural Network2018

Author(s)

Organizer

Related Report

[Presentation] Generative Adversarial Network Based i-Vector Transformation for Short Utterance Speaker Verification2018

Author(s)

Organizer

Related Report

[Presentation] A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition2018

Author(s)

Organizer

Related Report

[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2018

Author(s)

Organizer

Related Report

[Presentation] I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification2018

Author(s)

Organizer

Related Report

[Presentation] Few-Shot Adaptation for Multimedia Semantic Indexing2018

Author(s)

Organizer

Related Report