Linking Vision and Language through Computational Modelling
Project/Area Number |
19K12733
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 90030:Cognitive science-related
|
Research Institution | Kobe City University of Foreign Studies |
Principal Investigator |
CHANG Franklin 神戸市外国語大学, 外国語学部, 教授 (60827343)
|
Project Period (FY) |
2019-04-01 – 2024-03-31
|
Project Status |
Completed (Fiscal Year 2023)
|
Budget Amount *help |
¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2023: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2022: ¥390,000 (Direct Cost: ¥300,000、Indirect Cost: ¥90,000)
Fiscal Year 2021: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2020: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2019: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
|
Keywords | action understanding / deep learning / Japanese verbs / Vision / Language / Learning / Event understanding / Computational model / Deep Learning / Priming / Verbs / Syntax / Eyetracking / language / thematic roles / object tracking / connectionist model |
Outline of Research at the Start |
The first project will be the development of a computational model which can explain behavioral data from both adults and children within multiple object tracking tasks. The next step will be to extend this model to address motion understanding. The next project will link this computational model of action understanding to language. To test this computational model, we will do a series of eye-tracking studies which test various assumptions of the model.
|
Outline of Annual Research Achievements |
Published a paper describing experiments and a computational model for action understanding. Videos of simple actions (e.g., climbing a wall) were created in a 3D video game engine and Japanese adults and children asked to describe these scenes. Then we developed a deep learning model that learned about actions by tracking the multiple body parts of the animated figures in the videos. This information was then paired with the Japanese verbs that were used to describe the videos and the model could learn to produce Japanese verbs from the video. It could also use the endstate information in the visual scene to select past tense or present progressive. The model made predictions that were tested in a final experiment.
|
Report
(5 results)
Research Products
(5 results)