Multimodal time-sequence data recognition platform based on deep learning
Project/Area Number |
16H02845
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perceptual information processing
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
Shinoda Koichi 東京工業大学, 情報理工学院, 教授 (10343097)
|
Co-Investigator(Kenkyū-buntansha) |
井上 中順 東京工業大学, 情報理工学院, 助教 (10733397)
岩野 公司 東京都市大学, メディア情報学部, 教授 (90323823)
|
Project Period (FY) |
2016-04-01 – 2019-03-31
|
Project Status |
Completed (Fiscal Year 2018)
|
Budget Amount *help |
¥15,990,000 (Direct Cost: ¥12,300,000、Indirect Cost: ¥3,690,000)
Fiscal Year 2018: ¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2017: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000)
Fiscal Year 2016: ¥5,980,000 (Direct Cost: ¥4,600,000、Indirect Cost: ¥1,380,000)
|
Keywords | 知覚情報処理 / 音声情報処理 / 動画情報処理 / 深層学習 |
Outline of Final Research Achievements |
This research aims to accurately recognize multi-modal time-sequence signals using deep learning. We applied various deep learning techniques such as End-to-end training, deep net which is trainable with a small amount of data, multi-task learning, and noise-robust recognition. Particularly, we improved the recognition and detection performance in simultaneous training for source separation and speech recognition, dementia detection from speech, multi-modal speech recognition using lip reading, noise-robust speech recognition.
|
Academic Significance and Societal Importance of the Research Achievements |
深層学習はこの十年ほど画像認識や音声認識の標準的な技術となった。しかしながら、人間のもつ事前知識の活用、周囲環境の違いや話者の違いなどによる性能の劣化、学習のための大量のデータが得られない応用への適用、などの点においてまだ課題が多い。本研究では、これらの問題を解決する鍵となる、End-to-End学習、少ないデータからの効率的なモデル学習、マルチタスク学習、耐ノイズ認識の方式を提案し、一定の成果を得ることができた。これらの成果は実社会における様々な問題に対して容易に適用可能である。
|
Report
(4 results)
Research Products
(41 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] VANT at TRECVID 20182018
Author(s)
Nakamasa Inoue, Chihiro Shiraishi, Aleksandr Drozd, Koichi Shinoda, Shi-wook Lee, Alex Chichung Kot
Organizer
TRECVID workshop
Related Report
Int'l Joint Research
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] Video Semantic Indexing and Localization2016
Author(s)
Koichi Shinoda
Organizer
5th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan
Place of Presentation
Hilton Hawaiian Village, Honolulu, USA
Year and Date
2016-11-28
Related Report
Int'l Joint Research / Invited
-
[Presentation] TokyoTech at TRECVID 20162016
Author(s)
Nakamasa Inoue, Ryosuke Yamamoto, Na Rong, Koichi Shinoda
Organizer
NIST TRECVID workshop
Place of Presentation
NIST, Gaithersburg, MA, USA
Year and Date
2016-11-14
Related Report
Int'l Joint Research / Invited
-
-
-
[Presentation] Deep Learning for Speech, Image, and Video2016
Author(s)
Koichi Shinoda
Organizer
International Conference on Computer, Control, Informatics, and Its Applications (IC3INA)
Place of Presentation
Indonesia Convention Exhibition (ICE), Tangerang, Indonesia
Related Report
Int'l Joint Research / Invited
-
-