2020 Fiscal Year Research-status Report
Audio-visual learning in neural network for elderly surveillance
Project/Area Number |
19K20335
|
Research Institution | University of Tsukuba |
Principal Investigator |
Gatto Bernardo 筑波大学, 人工知能科学センター, 研究員 (10826267)
|
Project Period (FY) |
2019-04-01 – 2023-03-31
|
Keywords | elderly surveillance / subspace representation / image recognition / deep learning |
Outline of Annual Research Achievements |
In the 2nd period of the project denominated: "Audio-visual learning in neural network for elderly surveillance", we developed tensor methods based on subspaces for the extraction of visual features from videos, which was published in 2 journals: 1)Expert Systems with Applications, "Tensor Analysis with n-Mode Generalized Difference Subspace" (Authors: Bernardo B. Gatto, E. M. Santos, Alessandro L. Koerich, Kazuhiro Fukui, W. S. S. Junior) and 2)Applied Soft Computing, "Multilinear Clustering via Tensor Fukunaga-Koontz Transform with Fisher Eigenspectrum Regularization (Authors: Bernardo B. Gatto, E. M. Santos, Marco A. F. Molinetti, Kazuhiro Fukui). These methods can quickly extract supervised or unsupervised features for video processing, which may improve elderly surveillance systems. During the 2nd year of research, we offered the following activities, which were properly developed: 1)Evaluation of different types of subspaces: In the last year, we learned that falling is one of the main risks for the elderly living alone. Therefore, this year, we have developed subspace-based solutions able to describe such phenomena appropriately. 2)Comprehensive evaluation: We have collaborated with international institutions to acquire datasets and novel knowledge to enrich our research. For instance, we have been working with researchers from the University of Bordeaux and the University of Quebec. 3)Report experimental results on journals: In addition to the mentioned publications, we are developing new methods for future reporting.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The current status of the research is distributed into three topics, as follows: (1) method enhancement: At this stage, we examine the proposed subspace methods and how to enhance them in order to solve other problems rather than surveillance. This procedure is important to guarantee the applicability of the proposed methods in distinct scenarios. For example, the use of singular spectrum analysis can represent wave data, which may benefit our method. (2) comprehensive formulation: At present, we are evaluating the proposed neural networks and tensor decomposition methods on falling action recognition, which consists of detecting if a person is falling or not. This procedure is required since falls cause most of the accidents experienced by the elderly. Since we developed two collections of solutions (one based on neural networks and another one based on tensor analysis), we understand that combining both solutions may improve the accuracy of the system and provide a theoretical basis for neural networks based on tensor decompositions. (3) study of theoretical limits: The current limitations of the proposed solutions are of interest of the research community since it provides clues for new network architectures. We have been reporting our preliminary results on various journals, and we are currently reporting results for other two journals in acoustics. Our recent findings conduct us towards deep neural solutions and subspace fusion. Thus, we are developing deep neural-based solutions for tensor fusion, representation and classification.
|
Strategy for Future Research Activity |
For future work, we aimed to improve the proposed neural networks and tensor analysis methods. We understand that the current research direction is leading to a data fusion scheme, where tensor data from both videos and audio can be regarded as multimodal distributions, where a tensor fusion analysis is required. We believe that such an approach is novel and may reveal new limits of subspace learning. We also planned to investigate the theoretical limits of the proposed neural networks and tensor analysis methods. In the particular case of the elderly surveillance system, interpretability is a requirement that cannot be avoided. For a practical example, falling objects usually produce sound, which can be exploited for prediction. However, so far, it is challenging to determine whether sound or video can describe a falling object better. In the absence of video information or occlusion, is sound information sufficient to describe a falling object? (1) We aim to investigate is the learning limits of training data for subspace fusion. Since it is costly to label the elderly's activities, how much data is necessary to yield a satisfactory learning model is an essential open investigation direction. (2) A subspace-based network equipped with tensor analysis capabilities may handle both acoustic and visual data, providing a flexible and robust learning model. Like the last research year, we intend to report our findings in two journal papers on relevant transactions on data fusion for tensor analysis.
|
Causes of Carryover |
In the following research year, we aim to report our investigations and results in two international conferences and journal papers. These articles will explain the fusion aspects of tensor subspace learning regarding acoustic and video data. Also, since optimizing deep neural network models is hardware intensive, we aim to acquire a computer server to accomplish computationally massive experiments. We will continue our collaboration with international institutions to promote the improvement of our research, facilitating workshops to collect databases and exchange information.
|