2019 Fiscal Year Research-status Report
Zero-shot recognition of generic objects
Project/Area Number |
19K24344
|
Research Institution | Kobe University |
Principal Investigator |
|
Project Period (FY) |
2019-08-30 – 2021-03-31
|
Keywords | Zero-Shot Learning / Feature Extraction / Semantic representations / Resource Efficiency / CNN |
Outline of Annual Research Achievements |
My initial efforts have been focused towards decreasing the memory cost of training Convolutional Neural Networks (CNNs) for the visual feature extraction step of the Zero-Shot Learning system. Towards this end, I have investigated a family of architectures made of submodules whose computations either admit an analytical inverse or whose analytical inverse can be recovered with minimal memory cost. Using their analytical inverse, hidden activations necessary for the computations of the network's weight gradients can be backpropagated together with the gradient during the backpropagation step, hence bypassing the need to maintain these activations in memory. I characterized and derived a precise quantification of the numerical errors arising in the inverse reconstructions within long chains of invertible modules. I used this analysis to drastically reduce the GPU memory cost of training Convolutional Neural Networks. A preliminary version of this analysis has been presented during the Neural Architects Workshop of the International Conference on Computer Vision 2019 in Seoul. A more complete exposition of this analysis is currently under review for the EURASIP Journal on Image and Video Processing.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Although progress has been rather steady, the focus of this study has slightly shifted from the focus on efficient learning of semantic representations to a focus on resource-efficient learning of the visual representations. This shift has been motivated by the recent success of self-supervised representation learning. While self-supervision has been used as signal for learning semantic representations (Word embedding, Knowledge Graph embedding, etc.), the visual features used for ZSL until now have been used by CNNs trained on a supervised classification task. I have come to believe that supervised learning of the visual representations may over-specialize these representations on the auxiliary training task and that self-supervised visual features may hold the key to better generalization to the unseen classes of the test set.
|
Strategy for Future Research Activity |
Future work will focus on the integration of the unsupervised representations learned with our previous efforts to the end-to-end Zero-Shot Learning system. In particular, the newly extracted visual features should be tested against the newly extracted semantic features from both knowledge bases and unstructured text representations. To do so, the data extraction pipeline should be completed together with the semantic feature extraction step.
|
Causes of Carryover |
While we conjecture that visual representations learned with self-supervision are more amenable to zero-shot recognition, these representations also come with a greater computational cost. Hence, half of the grant money will be allocated to the purchase of a more powerful GPU workstation. Difficulties in finding the proper equipment have delayed the purchase of this computer from FY2019 to FY2020. The remainder of the grant money will be allocated to either the participation to international workshops or invested in minor hardware update and personnel costs of partner students, depending on the evolution of the current virus situation.
|