2021 Fiscal Year Final Research Report

Building a Video Search Engine based on the Perception of Spatio-temporal Relations

Research Project

PDF

Project/Area Number	19K12028
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Kindai University
Principal Investigator	Shirahama Kimiaki 近畿大学, 理工学部, 准教授 (30467675)
Project Period (FY)	2019-04-01 – 2022-03-31
Keywords	映像検索 / 物体の時空間関係 / グラフたたみ込み / 記憶伝達 / 強化学習 / TRECVID
Outline of Final Research Achievements	This project addresses three main topics, 1) Video retrieval by considering spatio-temporal relations among objects, 2) Extraction of temporal features in a video by considering the continuities of semantic contents and 3) Learning a model that captures human memory mechanism for frames in a video. In particular, regarding the third topic, a reinforcement learning method has been developed to train a model that is based on a memory defined as a finite external storage and can update it so as to achieve the optimal understanding of contents in a video. In addition, this method has been extended to the data mining field, where a dataset containing a large number of items is targeted, and a model is trained to update a set of items in order to form statistically characteristic sets.
Free Research Field	マルチメディア情報処理
Academic Significance and Societal Importance of the Research Achievements	深層学習の導入によって、画像認識性能は大幅に向上したが、映像認識では、それほどの性能向上が得られていない理由として、画像認識で用いられているたたみ込みニューラルネットワーク（CNN）の演算が人間の知覚メカニズムとよく合致している一方で、時間を伴う映像に対しては、長短期記憶（LSTM）などの既存モデルの演算が、人間の時間知覚メカニズムに合致していない点が挙げられる。この問題に対して、行動心理学に基づいて、人間の意思決定をモデル化するために有用な強化学習という手法を用いて、映像の内容を適切に理解するための記憶伝達メカニズムを模倣するモデルを学習し、その有効性を実験的に示した点に学術的意義がある。