2021 Fiscal Year Annual Research Report

対話型AIのための音声と身体表現の同時生成に基づく自然なインタラクションの実現

Research Project

Project/Area Number	20K19903
Research Institution	NTT Communication Science Laboratories
Principal Investigator	千葉祐弥日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究主任 (30780936)
Project Period (FY)	2020-04-01 – 2022-03-31
Keywords	音声対話システム / マルチモーダル情報処理 / 応答生成
Outline of Annual Research Achievements	本研究課題では，まず，ユーザ発話の言語的情報と韻律情報を入出力に利用するニューラルベース音声応答モデルの検討を行った．提案モデルでは，ユーザ発話の単語系列とその平均対数F0系列を入力とし，応答発話とその韻律を制御する差分F0コンテキスト系列を出力する．実験結果より，提案手法はベースラインよりも自然音声に近いF0系列が得られることを確認した．続いて，音声応答モデルを表情制御信号を考慮できるように拡張したマルチモーダル応答生成モデルを検討した．このモデルは，ユーザ発話の単語系列と韻律・表情特徴量を順次入力し，システム応答の各単語に対応する韻律・表情制御信号を出力する．韻律・表情特徴量は対応する単語区間の平均対数F0，平均AU である．実験により，入力情報として複数のモダリティを考慮することでモデルの性能が向上することを示唆する結果を得た．提案モデルの学習には二者の自由対話のデータを用いた．自然発話音声を対象とした応答生成モデルの学習効率を向上させるため，Twitterから収集されたツイート・リプライ対に対して，フィラー挿入を行うデータ拡張手法を提案した．この手法では，従来手法に比べてF値ベースで高い性能で書き言葉にフィラーを挿入することができる．さらに，マルチモーダル情報を利用した応答タイミング推定モデルを検討した．この研究では，既存手法であるResponse Timing Networkに対して，対話コンテキストエンコーダを導入したモデルを提案した．結果より，提案手法は画像情報を組み合わせることで，将来のシステム発話を利用することなく先行研究と同等の性能が得られることが示された．本研究課題はこれらの成果により，全体として6件の国内学会・研究会発表，4件の国際会議発表，1件の特許出願を行った．

Research Products
(5 results)

All 2021

All Presentation (4 results) (of which Int'l Joint Research: 3 results) Patent(Industrial Property Rights) (1 results)

[Presentation] Multimodal dialogue response timing estimation using dialogue context encoder2021
- Author(s)
  Ryota Yahagi, Yuya Chiba, Takashi Nose, Akinori Ito
- Organizer
  International Workshop on Spoken Dialog System Technology
- Int'l Joint Research
[Presentation] Speaker intimacy in chat-talks: Analysis and recognition based on verbal and non-verbal information2021
- Author(s)
  Yuya Chiba, Yoshihiro Yamazaki, Akinori Ito
- Organizer
  Workshop on the Semantics and Pragmatics of Dialogue
- Int'l Joint Research
[Presentation] Neural spoken-response generation using prosodic and linguistic context for conversational systems2021
- Author(s)
  Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito
- Organizer
  Interspeech
- Int'l Joint Research
[Presentation] マルチモーダル情報に基づくシステム応答の韻律・表情制御信号の生成に関する検討2021
- Author(s)
  渡辺稜哉，千葉祐弥，能勢隆，伊藤彰則
- Organizer
  人工知能学会研究会
[Patent(Industrial Property Rights)] 音声対話システムのための区分的韻律制御技術2021
- Inventor(s)
  山崎善啓，能勢隆，伊藤彰則，千葉祐弥
- Industrial Property Rights Holder
  山崎善啓，能勢隆，伊藤彰則，千葉祐弥
- Industrial Property Rights Type
  特許
- Industrial Property Number
  2021183018

2021 Fiscal Year Annual Research Report

対話型AIのための音声と身体表現の同時生成に基づく自然なインタラクションの実現

Principal Investigator

千葉 祐弥 日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究主任 (30780936)

Research Products

[Presentation] Multimodal dialogue response timing estimation using dialogue context encoder2021

Author(s)

Organizer

[Presentation] Speaker intimacy in chat-talks: Analysis and recognition based on verbal and non-verbal information2021

Author(s)

Organizer

[Presentation] Neural spoken-response generation using prosodic and linguistic context for conversational systems2021

Author(s)

Organizer

[Presentation] マルチモーダル情報に基づくシステム応答の韻律・表情制御信号の生成に関する検討2021

Author(s)

Organizer

[Patent(Industrial Property Rights)] 音声対話システムのための区分的韻律制御技術2021

Inventor(s)

Industrial Property Rights Holder

Industrial Property Rights Type

Industrial Property Number

千葉祐弥日本電信電話株式会社NTTコミュニケーション科学基礎研究所, 協創情報研究部, 研究主任 (30780936)