learning of multimodal representation and its application
Project/Area Number |
26330249
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Shinshu University |
Principal Investigator |
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Project Status |
Completed (Fiscal Year 2016)
|
Budget Amount *help |
¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Fiscal Year 2016: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2015: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Fiscal Year 2014: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | 画像認識 / 機械学習 / 深層学習 / 画像検索 / ニューラルネットワーク / 画像識別 / ネットワーク圧縮 / 属性認識 / 構造学習 / 確率的トピックモデル / 文書認識 / 多重音解析 / 画像想起性 |
Outline of Final Research Achievements |
Research on multimodal learning and its application to image search has been carried out. We have studied the DBM to jointly represent the image and corresponding text. We showed feature vector obtained from the joint layer could give rise to better classification results than CNN-based features. We also studied image query method based on multimodal representation which is enabled by using visual-semantic embedding model based on CNN and LSTM. It allows us to perform analogical reasoning over images by specifying properties to be added and subtracted by words. We introduced a novel similarity measure based on the difference between additive and subtractive query. Our methods strongly depend upon the CNNs for image processing. To reduce the computational cost of the image processing by the CNN, we examined a method for compressing the given CNN. We proposed the compression method based on the block-wise distillation and examined its effectiveness.
|
Report
(4 results)
Research Products
(8 results)