2023 Fiscal Year Final Research Report

Modelling Speech Spectra Based on Logarithmic Shallow Neural Networks

Research Project

PDF

Project/Area Number	21K11957
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	The University of Electro-Communications
Principal Investigator	Nakashika Toru 電気通信大学, 大学院情報理工学研究科, 准教授 (90749920)
Co-Investigator(Kenkyū-buntansha)	矢田部浩平東京農工大学, 工学(系)研究科(研究院), 准教授 (20801278)
Project Period (FY)	2021-04-01 – 2024-03-31
Keywords	音声符号化 / 音声モデリング / 機械学習 / 複素確率分布 / ボルツマンマシン / ガンマ分布 / フォン・ミーゼス分布 / 音源分離
Outline of Final Research Achievements	Speech is one of the most important communication tools, and various speech technologies are used around us. Especially in recent years, deep learning is often used blindly as its backend because it has been attracting worldwide attention. While deep learning shows very high performance for each task, it has the disadvantage of having a huge number of parameters and high computational cost. Compact machine learning models with a fewer number of parameters are preferable for small devices with limited computational resources. In this study, we proposed a new methodology and framework for a compact shallow-layer model that appropriately represents data, focusing on the specific properties and structures of speech data, and verified the effectiveness of the proposed model through multiple experiments.
Free Research Field	音声処理
Academic Significance and Societal Importance of the Research Achievements	本研究では，音声のデータ構造に着目し，主に音声複素スペクトルを対数的に表現する複素浅層ニューラルネットを提案した。重要な本研究成果の1つとして，このモデルが，僅か800バイト程度の情報量で，最新の深層学習技術に基づく巨大なニューラルネットワークモデルと同程度の性能を示した，ということが挙げられる。このことから闇雲にパラメータ数を増やしてモデルを巨大化させるのではなく，知恵を絞って適切にデータを表現する方が得策であると言える。またこのようなコンパクトな浅層モデルは，演算による消費電力を抑えることにもなり，省エネで地球環境に配慮したグリーンコンピューティングなアプローチとして貢献することができる。