研究実績の概要 |
In this academic year, efforts have been focused on more core structural questions of Zero-Shot Learning: We focused our search on semantical representations and on re-thinking the methodology behind ZSL benchmarks. Regarding semantical representations: The past year has seen a strong trend towards leveraging large language models to process visual captions on web-scale image collections, and successfully leverage these representations as a training signal to visual models. This line of work echoes some of our previous works leveraging image captioning dataset in order to achieve zero-shot classification, albeit in much better quantitative results. We have focused our efforts on estimating wether the strong classification abilities brought by these new models from Google and OpenAI are due to the new scale of data used in the training or the representation abilities of large language model. Regarding the methodology behind ZSL benchmarks. We found two things: On the one hand the dimensioning of standard ZSL benchmark do not allow for the development of combinatorial generalization across classes due to the limited amount of visual classes defined. The methodology of web-scale supervision used in the previously mentioned work does remedy this shortcoming. On the other hand, web-scale supervision provides implicit information about the test classes used to evaluate zero-shot learning abilities. We found in a yet unpublished work different methodologies might allow measuring combinatorial generalization in a fair setting.
|