2023 Fiscal Year Final Research Report

Linking Vision and Language through Computational Modelling

Research Project

PDF

Project/Area Number	19K12733
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 90030:Cognitive science-related
Research Institution	Kobe City University of Foreign Studies
Principal Investigator	Chang Franklin 神戸市外国語大学, 英米学科, 教授 (60827343)
Project Period (FY)	2019-04-01 – 2024-03-31
Keywords	視覚情報 / ディープラーニングモデル / 動詞 / 過去形 / 進行形 / 終了状態 / 子ども / 大人
Outline of Final Research Achievements	Language is used to describe events that we see, but the relationship between visual and language representations is still not well understood. In this research, we focused on the visual cues that are used to select past tense (ran) and progressive aspect forms (is running). We created videos of actions by human-like characters where they performed actions like running. Then we added objects into the scene that signaled that endstate had been reached. We found that both Japanese adults and 3-5 year old children used past tense more when the videos has endstate information compared to when it didn’t. To understand how they mapped these visual signals into language, we developed a deep learning model that tracked the motion of body parts and objects in the videos and used that to generate Japanese verbs. The model could explain our data and it made predictions that were confirmed in a follow-up experiment. This work demonstrates that vision and language are tightly linked.
Free Research Field	言語心理学
Academic Significance and Societal Importance of the Research Achievements	本研究では、大人と子どもが動画の視覚情報をどのように利用して動詞や動詞の形態を生成するかを調べた。日本人が視覚的な情報からどのように動詞を生成するかを示す、計算AIモデルを開発した。このモデルは、第一言語と第二言語の習得をサポートするための視覚資料作成に役立つ。また、人間が視覚的情報を言語化する方法を解明する一助となる本研究は、日本語を話すA Iシステムを作成する際に役立つ。