2019 Fiscal Year Research-status Report
Audio-visual learning in neural network for elderly surveillance
|Research Institution||University of Tsukuba |
Gatto Bernardo 筑波大学, 人工知能科学センター, 研究員 (10826267)
|Project Period (FY)
2019-04-01 – 2023-03-31
|Keywords||elderly surveillance / subspace representation / image recognition / deep learning|
|Outline of Annual Research Achievements
In the first period of the project denominated: “Audio-visual learning in neural network for elderly surveillance”, we developed a fast neural network based on subspaces for the extraction of visual features from videos, which was published in two journals: (1)EURASIP Journal on Image and Video Processing, "A semi-supervised convolutional neural network based on subspace representation for image classification" (Authors: Bernardo B. Gatto, L. S. Souza, E. M. Santos, Kazuhiro Fukui, W. S. S. Junior, and K. V. Santos) and (2)Neural Processing Letters, "Fukunaga-Koontz convolutional network with applications on character classification" (Authors: Bernardo B. Gatto, E. M. Santos, Kazuhiro Fukui, W. S. S. Junior, and K. V. Santos). The proposed networks can quickly extract features for image and video processing for elderly surveillance systems. During the first year of research, we offered the following activities, which were properly developed:
(1)Problem definition: we aimed to study the main problems in elderly activity recognition and which activities were safe and not safe. In our study (which is under progress for this year), we learn that falling is one of the main risks for the elderly living alone. (2)Data acquisition: we are sharing a database to evaluate the proposed methods properly. We have been collaborating with international institutions to acquire such data. For instance, we have been working with researchers from the University of Bordeaux. (3)Basic evaluation: this year, the initial review of the proposed networks was performed by using several databases.
|Current Status of Research Progress
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
The current status of the research is divided into three topics, as follows:
(1) evaluation of different types of subspaces: Here, we are surveying the available subspace methods and how they connect with the problem we are attempting to solve. For instance, singular spectrum analysis is able to represent acoustic data, which may benefit our method.
(2) comprehensive evaluation: Currently, we are evaluating the proposed neural networks on falling action recognition, which consists of detecting if a person is falling or not. This problem is essential to solving since falls cause most of the accidents suffered by the elderly. By developing a learning model embedded in a system that is able to report whether a person is falling or not, in a quick fashion, the authorities can be contacted immediately, saving time for emergency laboratory care.
(3) report experimental results on journals: We already reported preliminary results on two reputable journals, and we are currently reporting results for the other two journals in acoustic and video representation. The current findings guide us towards deep neural solutions, where the developed networks are able to represent data through several layers. Currently, we are developing a deep neural network based on subspaces for multimodal representation and classification.
|Strategy for Future Research Activity
For future work, we planned to enhance the proposed neural networks and proposed a data fusion formulation where the network is able to combine both visual and acoustic subspaces. We also planned to investigate the theoretical limits of the so far proposed methods. The learning models should not only be able to classify correctly new instances but also guide towards the limits of what can be learned. In the particular case of the elderly surveillance system, interpretability is a requirement that cannot be avoided.
(1) We aim to exploit is the lower bound limits of training data. Since it is costly to label the elderly's actions, how much data is necessary to yield a satisfactory learning model. In a realistic scenario, One may not have sufficient data to train a satisfactory model or hold excess data, which is inconvenient for the practitioners. Therefore, providing a practical guide on how to obtain an optimal training set size is desirable.
(2) A subspace-based network for data fusion collects both acoustic and visual data from the outputs of the sensors (e.g., cameras, microphones) and progressively applies feature extraction in a hierarchical fashion. As a result, we expect to enhance information based on the fused sensor data. The potential improvement relies on the model accuracy and flexibility in handling multimodal data.
In the next research year, we aim to report our findings in two journal papers on relevant transactions on data fusion. These reports should describe the fusion aspects of subspace learning and the theoretical limits of the proposed models.
|Causes of Carryover
In the next research year, we aim to report our findings in two international conferences and journal papers on relevant transactions on data fusion. These reports should describe the fusion aspects of subspace learning and the theoretical limits of the proposed models.
Also, since optimizing deep neural network models is hardware intensive, we aim to acquire GPU’s and new computers to fulfill this demand. Collaboration with international institutions is the third goal to support the improvement of our research, enabling workshops to collect databases and exchange information.
Research Products (3 results)