2022 Fiscal Year Final Research Report

Spoken dialog system based on real-time control of dialogue tempo for smooth dialog

Research Project

PDF

Project/Area Number	19K04311
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 20020:Robotics and intelligent system-related
Research Institution	The University of Tokushima
Principal Investigator	NISHIMURA Ryota 徳島大学, 大学院社会産業理工学研究部(理工学域), 講師 (50635878)
Co-Investigator(Kenkyū-buntansha)	山本一公中部大学, 工学部, 教授 (40324230) 西崎博光山梨大学, 大学院総合研究部, 教授 (40362082)
Project Period (FY)	2019-04-01 – 2023-03-31
Keywords	音声対話システム / タイミング / テンポ / 音声言語情報処理 / 深層学習 / 音声認識 / リアルタイム制御 / ROS
Outline of Final Research Achievements	In this research, "construction of a real-time controllable spoken dialog system" was conducted. This system uses ROS architecture, which enables efficient communication management and debugging during system development. We also built the "back-channel timing control module," which is a response timing control module for use in this system. To achieve real-time operation, only simple acoustic information is used. The model is a simple LSTM model. This model is capable of generating the timing of the back-channel with an F-value as high as 0.933. The model is open-sourced on github and widely available to the public.
Free Research Field	音声対話システム
Academic Significance and Societal Importance of the Research Achievements	最近のChatGPTなどの大規模言語モデルの高精度化に伴い，音声対話システムに対しても，より高精度で自然な対話が期待され始めた．そして，自然に対話を行うためには対話のテンポが非常に重要であるものの，これまでの音声対話システムでは，設計上の問題で実現不可能であった．本研究の成果により，この問題が解決され，リアルタイムに制御が可能な音声対話システムを開発・動作させることが可能となる．高速に動作可能な相槌応答タイミング生成モデルも構築し，本システムに搭載したことから，音声対話システム開発者は，応答内容やその他の部分に注力してシステム開発することで，自然な音声対話システムを実現できる．