Computationally analyzing the hierarchical complexity of infants' social coordination on multi scales in natural daily life to investigate infants' cognitive development
Project/Area Number |
22K20314
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Multi-year Fund |
Review Section |
0110:Psychology and related fields
|
Research Institution | The University of Tokyo |
Principal Investigator |
Li Jiarui 東京大学, ニューロインテリジェンス国際研究機構, 特任研究員 (10966807)
|
Project Period (FY) |
2022-08-31 – 2024-03-31
|
Project Status |
Granted (Fiscal Year 2022)
|
Budget Amount *help |
¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
Fiscal Year 2023: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2022: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
|
Keywords | multi-scale coordination / infant daily interaction / culture differences / Social interaction / Infants development / Multi-scale analysis |
Outline of Research at the Start |
How do infants handle the massive cues achievable from the natural environment to produce a response even without based on the understanding of meaning? This project proposes to use multi-scale analysis to reveal the complexity of infant-caregiver’s social coordination in natural daily life.
|
Outline of Annual Research Achievements |
This project aims at quantifying social modalities and multi-scales to find the multi-combining computational mechanisms in infants’ interactive development. The current achievements focused on the first two of the three research questions: Q1. What cues in which modality do infants accept from the social-interactive environment in which time scale? Q2. How do infants manage the cues to interact with the environment inter/across time scale(s)? A multi-scales analysis framework was proposed based on the prosodic alignment between infants and caregivers in daily interaction. The framework analyzed the day-long audio data from a Tseltal corpus. The results revealed that infants’ and caregivers’ vocalizations are differently coordinated on each timescale, . Furthermore, the same analysis was conducted on a Canadian-English corpus with audio data under similar collection conditions. The results reveal that maternal prosodic alignment occurs in English but not Tseltal, while infant alignment to caregivers is weak but similarly present across cultures. Interestingly, Tseltal dyads were more likely than Canadian ones to show increases in prosodic alignment over longer sequences of interaction. A conference paper and a poster based on these results were presented at (Li, J., et al., ICDL 2022) and (Li, J., et al., SRCD 2023). Current findings stressed that infants received varied auditory cues from the environment on a global, a middle, and a local scale (Q1). Moreover, infants also behave differently, corresponding to these cues across time scales. (Q2)
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
In the original plan, the data collection and feature extraction work (Task 1 and Task 2) for stressing Q1 and Q2 were the main work in the first year. 30 Japanese-speaking infants’ day-long auditory recordings were planned to be collected longitudinally from 6-12 months (once every three months). Currently, the data collection of 6 months was completed, and 24 and 13 samples of 9 months and 12 months were collected. The data collection work will be finished in the early second year as planned. Except for audio data collection, the activity-depends video recordings were also collected. Moreover, before extracting the bottom-up features from the collected data, speaker/motion/object recognition and noise reduction need to be done on the raw data. We created an instruction for preprocessing the natural day-long recordings of infants’ natural interaction, which combined the automatic methods, e.g., machine learning algorithms and manual annotations. We hope the instruction can be commonly used for similar corpus from other cultures. Furthermore, automatic methods for extracting prosodic and temporal features from audio data and facial/bodily features from video data were confirmed. The current progress of data analysis focused on the multi-time scales audio data (Task 3). The multi-modality analysis has not been involved due to the limited quality of the video data. Advanced automatic computational models are planned to be introduced to solve the current challenges in the future.
|
Strategy for Future Research Activity |
The next steps work will focus on a. infant-caregiver’s coordination of multi-modality cues (Q1, Q2) b. The development of the models used by infants to learn the interactivability.(Q3). For the multi-modality analysis (Task 3), considering the quality of the video from the natural recordings, the analysis will focus on quantifying the amounts of social cues, e.g., facial and bodily cues, input to the infants. The stability and dynamic of the inputs will be further measured by using data across time scales. For development analysis (Task 4), We are going to investigate the dynamic development in infant-caregiver’s coordination with social cues from different modalities. The project is processing smoothly according to the original plan.
|
Report
(1 results)
Research Products
(2 results)