2021 Fiscal Year Final Research Report

Formal Semantic Representations to Link Language and Vision

Research Project

PDF

Project/Area Number	18H03268
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	The University of Tokyo
Principal Investigator	Miyao Yusuke 東京大学, 大学院情報理工学系研究科, 教授 (00343096)
Project Period (FY)	2018-04-01 – 2021-03-31
Keywords	意味表現 / 自然言語処理 / 画像処理
Outline of Final Research Achievements	This research explored semantic representations for images with the aim of applying semantic analysis technologies of natural languages to visual information. Specifically, we developed a method for linking entities in an input image into database IDs and a technique for compositionally constructing semantic representations of images. In addition, we designed a new task of generating a caption given an image and a fragment of a semantic representation as input and showed the effectiveness of using semantic representations for images.
Free Research Field	自然言語処理
Academic Significance and Societal Importance of the Research Achievements	画像と言語をつなぐ技術は近年数多く研究されているが、そのほとんどは画像と言語を入出力として深層学習モデルを学習する手法である。この手法は大規模な学習データがあれば多くのタスクで高い精度を達成するが、学習データがない場合や、外部知識や推論を必要とする高度なタスクに適用することは難しい。提案手法のように画像に対して意味表現を得ることができれば、意味表現を利用した自然言語処理技術を応用する道が開け、さまざまな技術に発展することが期待できる。