研究実績の概要 |
We take Wikipedia featured articles and photos for venues as basic knowledge to learn a deep correlation model for fine-grained venue discovery from Foursquare photos. Specifically, we are interested in the challenging research problem of venue discovery from multimodal dataset: given a photo (with a rough position) as an input, the system returns its exact venue name (i.e., in which venue the photo was taken), category, and textual description. This work has demonstrated the first study on visual venue discovery over an integrated venue-related multimodal dataset. We proposed a novel framework for deep correlation learning to realize fine-grained venue discovery. In particular, we apply a deep learning model, deep canonical correlation analysis (DCCA), to learn the correlations between venue photos and venue descriptions obtained from Wikipedia and Foursquare. Particularly, our contribution is three-fold: i) A novel dataset for venue-related multimodal contents is created based on integrating venue photos and descriptions from Wikipedia and Foursquare to solve fine-grained venue discovery with the aim of academic research, ii) An end-to-end deep network with two branches CNN and Doc2vec is trained, which converts different views into the same space and maximizes their correlations there, and iii) Extensive experiments verify the practicability of the DCCA model for fine-grained venue discovery, where DCCA outperforms state-of-the-art methods such as KCCA [4]. Some dataset are available on http://research.nii.ac.jp/_yiyu/VenueNet.htm for the research purpose.
|
今後の研究の推進方策 |
There is still much room for improvements, as follows: i) Based on our dataset, some topics will be investigated, e.g. cross-modal retrieval and image question answering where visual objects are described by natural language with different levels of understanding. ii) We still keep enlarging the number of image-text pairs for venues to investigate more interesting questions, for example, is there any correlation between visual objects and categories? iii) We will try to incorporate more data domains such as checkins and tips for personalized venue recommendation. iv) We will investigate more deep learning methods such as long short term memory(LSTM) for processing Wikipedia articles.
|
次年度使用額が生じた理由 |
Last year, I planned to attend ACMMM17 held in Mountain View, CA USA. Since I had to prepare for my courses, I did not have time to go for USA. This year, we have submitted the papers to ACMMM18 and plan to submit the paper to ICDM18. So, I would like to use this budget to support me or my student to attend ACMMM18 held in Seoul, Korea or attend IEEE ICDM2018 held in Singapore.
|