Construction and evaluation of a prosody control model for effective information transmission by speech to the elderly

Research Project

Project/Area Number	20K11869
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Suwa University of Science
Principal Investigator	Mizuno Hideyuki 公立諏訪東京理科大学, 工学部, 教授 (30833892)
Co-Investigator(Kenkyū-buntansha)	中嶋秀治日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究主任 (90832684)
Project Period (FY)	2020-04-01 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000) Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2021: ¥260,000 (Direct Cost: ¥200,000、Indirect Cost: ¥60,000) Fiscal Year 2020: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
Keywords	高齢者向け発話データの整備 / 韻律分析 / 韻律モデル構築 / 言語モデル構築 / 言語的モデルの構築 / 発話テキスト収集 / 音声収録 / 高齢者向け発話 / 基本周波数 / 統計分析 / 音声合成 / 高齢者 / 韻律制御
Outline of Research at the Start	高齢者の音声による情報把握の支援については，これまで多くの補聴器などの音響系の研究が行われてきた一方で，高齢者にとって内容を理解しやすいような音声の特徴に関する研究はあまり行われていない．そこで，本研究では高齢者にとって聞き取りやすい音声の韻律（抑揚や話す速さなどの音声の特徴）制御方法の確立を目的とし，以下の順に研究を進める．まず、高齢者にとって聞きとりやすい音声を収集しその言語的音声的な特徴を分析し，次に得られた特徴に基づいて文章全体または文章の一部の内容に応じて韻律の制御を行なう高齢者向けの韻律制御方式を確立する．この研究の成果は，高齢者の情報格差を補償することが期待できる．
Outline of Final Research Achievements	In 2020, we created 136 documents with labels attached to important parts by a female speaker who was evaluated as being the easiest to hear by elderly, and collected both of utterances with conscious of elderly and reading style utterances. In 2021, we conducted a comparative analysis of the prosodic differences between the two types of speech, and confirmed the expansion of the average value and range of F0 and the increase of the maximum F0 value at important parts. In 2022, we constructed a prosody control model and confirmed that the F0 maximum value can be controlled with a high accuracy of 0.75 as a coefficient of determination by objective evaluation, but we did not found an effect on the ease of hearing by a subjective evaluation using analysis-by-synthesis speech. In addition, we constructed a language model that predicts important parts and confirmed that it is possible to predict with a high accuracy of about 81%.
Academic Significance and Societal Importance of the Research Achievements	1）高齢者にとって聞き取りやすいと評価されている話者が同一内容の文章を高齢者を意識して発話した音声と読み上げた音声をパラレルで収集し，両者の韻律的な差異を統計的に分析することで，高齢者にとって聞き取り易い音声の韻律的な特徴を明らかにした． 2）読み上げ音声から高齢者向け発話の韻律を予測する韻律予測モデルを構築し，高い精度で予測可能であることを示し，通常の読み上げ音声から高齢者にとって聞き取りやすい音声への変換が可能であることを示した． 3）高齢者の情報取得の観点から重要と考えられる文書内での重要な箇所を言語モデルによって高精度に予測することが可能であることを示した．