Project/Area Number |
19K12023
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61010:Perceptual information processing-related
|
Research Institution | Osaka Prefecture University |
Principal Investigator |
|
Co-Investigator(Kenkyū-buntansha) |
岩村 雅一 大阪府立大学, 工学(系)研究科(研究院), 准教授 (80361129)
井上 勝文 大阪府立大学, 工学(系)研究科(研究院), 准教授 (50733804)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Project Status |
Completed (Fiscal Year 2021)
|
Budget Amount *help |
¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2021: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2020: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2019: ¥3,120,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥720,000)
|
Keywords | Sign lang. recognition / 3D conv. neural networks / Deep learning / Attention Network / Sign Lang. Recognition / 3D Conv. Network / I3D Network / Temporal information / Multi-stream network / Optical flow / Skelton / Face / Hand / 3D Avatar Model / Machine Learning / Natural Lang. Processing / SyntheticData Generation / Computer Vision / Deep Learning / Sign Language / Data Synthesis |
Outline of Research at the Start |
Hearing impaired people use Sign language (SL) as the primary way of communication which is performed through the use of hand gestures, movements of arm/body, expressions, etc. To understand the SL by common people, many approaches were proposed for gesture recognition. A limitation of these approaches is that they require a large dataset for preparing the machine learning models which require manual annotation of millions of gestures. To solve this, we propose to develop a 3D avatar model to mimic SL which will be used to generate synthetic data. It will be a robust system for SL recognition.
|
Outline of Final Research Achievements |
To improve the performance of existing word-level Sign Language Recognition (W-SLR), in our first approach, a system with a multi-stream structure focusing on global information, local information, and skeletal information was proposed. The local information comprises of handshape and facial expression. The skeleton information captures hand position relative to the body. By combining these three streams, the proposed method achieves higher recognition performance than the state-of-the-art methods. In the second work, the original I3D network which was originally proposed for action recognition problems has been modified to improve the WSLR performance. The improvement includes an improved inception module named dilated inception module (DIM) and an attention mechanism-based temporal attention module (TAM) to identify the essential features of gestures.
|
Academic Significance and Societal Importance of the Research Achievements |
Word-level Sign Language Recognition (W-SLR) systems overcome the communication barrier between people with speech impairment and those who can hear. In our approach, we combined these local and relative position of body parts and achieved higher performance on most W-SLR datasets.
|