2018 Fiscal Year Annual Research Report

Multimodal time-sequence data recognition platform based on deep learning

Research Project

Project/Area Number	16H02845
Research Institution	Tokyo Institute of Technology
Principal Investigator	篠田浩一東京工業大学, 情報理工学院, 教授 (10343097)
Co-Investigator(Kenkyū-buntansha)	井上中順東京工業大学, 情報理工学院, 助教 (10733397) 岩野公司東京都市大学, メディア情報学部, 教授 (90323823)
Project Period (FY)	2016-04-01 – 2019-03-31
Keywords	知覚情報処理 / 音声情報処理 / 動画情報処理 / 深層学習
Outline of Annual Research Achievements	本研究では、音声や動画像などからなるマルチモーダルの時系列データから有用な情報を抽出するための、深層学習を用いた高性能な情報検索基盤を構築することを目的としている。特にEnd-to-endな音声処理・動画処理を実現することを目的としている。音声の研究では，従来の音声認識を対象とした研究とは別に、音声から感情や健康状態等などのパラ言語情報を獲得する研究に取り組んだ。従来音声認識で用いてきたRNNの代わりに、Gated CNNを用いることにより、より少量の学習データで、効率的な学習が可能な、Gated CNNを用いる手法を開発した。音声からの(書き起こしを用いない)認知症診断で高性能な識別結果を得た。時系列データの認識にGated CNNが有効であることを確認した。国際会議Interspeech2018で成果を発表した。また、従来のend-to-end音声認識の研究において、RNNに注意機構(attention)を実装し、また、その注意機構付きRNNを蒸留処理により小型化した。従来よりも10分の1のモデルサイズを達成した。認識率の劣化は7%に留まった。動画像の認識では、動画をセグメントに分け、各々のセグメントに対してCNNを適用する方式を開発した。特に、人間の骨格構造に特化したCNNを、深度カメラの動画像からの身振り認識に適用した。カメラの撮像角度が異なる動画像が含まれるデータベースの評価において、その時点での世界最高性能を得ることができた。国際会議BMVC2018にて成果を発表した。
Research Progress Status	平成30年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	平成30年度が最終年度であるため、記入しない。

Research Products
(17 results)

All 2019 2018

All Presentation (17 results) (of which Int'l Joint Research: 8 results, Invited: 2 results)

[Presentation] 情報理工学の現状と将来2019
- Author(s)
  篠田浩一
- Organizer
  第40回蔵前科学技術セミナー
- Invited
[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2019
- Author(s)
  Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
- Organizer
  情報処理学会研究報告 SLP
[Presentation] A robust algorithm of phase recovery for speech enhancement2019
- Author(s)
  Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
- Organizer
  電子情報通信学会技術研究報告 SP
[Presentation] Improving the robustness of multiple input spectrogram inversion2019
- Author(s)
  Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
- Organizer
  日本音響学会2019年春季研究発表会講演論文集
[Presentation] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION2019
- Author(s)
  Raden Mu’az Mun’im, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ICASSP2019
- Int'l Joint Research
[Presentation] 深層学習のためのCo-Design2018
- Author(s)
  篠田浩一
- Organizer
  電子情報通信学会技術研究報告 SP/PRMU
- Invited
[Presentation] 単語分散表現を用いた動画からのイベント検出2018
- Author(s)
  金井怜, 井上中順, 李時旭, 篠田浩一
- Organizer
  第21回画像の認識・理解シンポジウム (MIRU)
[Presentation] Astronomical Image Subtraction for Transient Detection Using CNN2018
- Author(s)
  Yan Long, Nakamasa Inoue, Koichi Shinoda, Yoichi Yatsu, Ryosuke Itoh, Nobuyuki Kawai
- Organizer
  The 21st Meeting on Image Recognition and Understanding (MIRU)
[Presentation] Alzheimer's Disease Prediction Using Audio Gated Convolutional Neural Network2018
- Author(s)
  Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ASJ 2018 Autumn Meeting
[Presentation] Generative Adversarial Network Based i-Vector Transformation for Short Utterance Speaker Verification2018
- Author(s)
  Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
- Organizer
  ASJ 2018 Autumn Meeting
[Presentation] A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition2018
- Author(s)
  Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
- Organizer
  British Machine Vision Conference (BMVC)
- Int'l Joint Research
[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2018
- Author(s)
  Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
- Organizer
  Interspeech
- Int'l Joint Research
[Presentation] I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification2018
- Author(s)
  Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
- Organizer
  Interspeech
- Int'l Joint Research
[Presentation] Few-Shot Adaptation for Multimedia Semantic Indexing2018
- Author(s)
  Nakamasa Inoue, Koichi Shinoda
- Organizer
  ACM Multimedia
- Int'l Joint Research
[Presentation] VANT at TRECVID 20182018
- Author(s)
  Nakamasa Inoue, Chihiro Shiraishi, Aleksandr Drozd, Koichi Shinoda, Shi-wook Lee, Alex Chichung Kot
- Organizer
  TRECVID workshop
- Int'l Joint Research
[Presentation] Skeleton-based Human Action Recognition with Fine-to-Coarse Convolutional Neural Network2018
- Author(s)
  Thao Minh Le, Nakamasa Inoue, Koichi Shinoda
- Organizer
  Technical Reports of IEICE PRMU
- Int'l Joint Research
[Presentation] The NEC-TT Speaker Verification System for SRE’182018
- Author(s)
  K. A. Lee, H. Yamamoto, K. Okabe, Q. Wang, L. Guo, T. Koshinaka, J. Zhang, K. Shinoda
- Organizer
  NIST 2018 Speaker Recognition Evaluation
- Int'l Joint Research

2018 Fiscal Year Annual Research Report

Multimodal time-sequence data recognition platform based on deep learning

Principal Investigator

篠田 浩一 東京工業大学, 情報理工学院, 教授 (10343097)

Research Products

[Presentation] 情報理工学の現状と将来2019

Author(s)

Organizer

[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2019

Author(s)

Organizer

[Presentation] A robust algorithm of phase recovery for speech enhancement2019

Author(s)

Organizer

[Presentation] Improving the robustness of multiple input spectrogram inversion2019

Author(s)

Organizer

[Presentation] SEQUENCE-LEVEL KNOWLEDGE DISTILLATION FOR MODEL COMPRESSION OF ATTENTION-BASED SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION2019

Author(s)

Organizer

[Presentation] 深層学習のためのCo-Design2018

Author(s)

Organizer

[Presentation] 単語分散表現を用いた動画からのイベント検出2018

Author(s)

Organizer

[Presentation] Astronomical Image Subtraction for Transient Detection Using CNN2018

Author(s)

Organizer

[Presentation] Alzheimer's Disease Prediction Using Audio Gated Convolutional Neural Network2018

Author(s)

Organizer

[Presentation] Generative Adversarial Network Based i-Vector Transformation for Short Utterance Speaker Verification2018

Author(s)

Organizer

[Presentation] A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition2018

Author(s)

Organizer

[Presentation] Detecting Alzheimer's Disease Using Gated Convolutional Neural Network from Audio Data2018

Author(s)

Organizer

[Presentation] I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification2018

Author(s)

Organizer

[Presentation] Few-Shot Adaptation for Multimedia Semantic Indexing2018

Author(s)

Organizer

[Presentation] VANT at TRECVID 20182018

Author(s)

Organizer

[Presentation] Skeleton-based Human Action Recognition with Fine-to-Coarse Convolutional Neural Network2018

Author(s)

Organizer

[Presentation] The NEC-TT Speaker Verification System for SRE’182018

Author(s)

Organizer

篠田浩一東京工業大学, 情報理工学院, 教授 (10343097)