2021 Fiscal Year Annual Research Report

End-to-End Model for Task-Independent Speech Understanding and Dialogue

Research Project

Project/Area Number	20H00602
Research Institution	Kyoto University
Principal Investigator	河原達也京都大学, 情報学研究科, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	井上昂治京都大学, 情報学研究科, 助教 (10838684) 吉井和佳京都大学, 情報学研究科, 准教授 (20510001)
Project Period (FY)	2020-04-01 – 2024-03-31
Keywords	音声理解 / 音声対話 / 音声認識 / End-to-Endモデル
Outline of Annual Research Achievements	人間どうしが行うような音声コミュニケーションにおいて、相手の意図・概念・感情を理解し、応答するためのモデルの研究を進めた。まず、その基盤となるEnd-to-End音声認識に関して、様々な発展を行った。長い発話に対しても遅延なく対応できるように、ストリーム型の注意機構モデルに基づく音声認識を実装した。大規模テキストデータで事前学習された高精度な双方向トランスフォーマー言語モデルを音声認識への知識蒸留やリスコアリングに活用する方法を検討した。これらの手法は、日本語と英語の標準的なデータベースで評価を行い、高い性能を実現した。次に、発話行為の単位に対応する句読点を推定するモデルの研究を行った。音声認識のネットワークと統合することでEnd-to-Endモデルを構成した。日本語と英語のデータベースで評価を行い、その有効性を確認した。また、音声から感情を認識するEnd-to-Endモデルについても研究を行った。日本語と英語のデータベースで評価を行い、最高水準の性能を実現した。音声情報に基づく感情認識と言語情報に基づく感情認識の統合についても検討を行い、両者の相乗効果を確認した。さらに、音声から相槌などを生成する処理系においては、相手の笑いに同調した共有笑いを生成するモデルの研究を行った。End-to-End(Seq-to-Seq)モデルに基づく対話システムでは、単調で無難な応答が生成される傾向があるため、多様な応答を生成するための学習法についても研究を行った。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 各研究テーマについて進捗し、論文発表などを行うことができた。
Strategy for Future Research Activity	要素技術をさらに発展させるとともに、音声対話システムとして統合していく。

Research Products
(8 results)

All 2021

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 2 results) Presentation (6 results) (of which Int'l Joint Research: 6 results)

[Journal Article] Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition2021
- Author(s)
  Inaguma Hirofumi、Kawahara Tatsuya
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 29 Pages: 1～15
- DOI
  10.1109/TASLP.2021.3133217
- Peer Reviewed / Open Access
[Journal Article] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition2021
- Author(s)
  Ueno Sei、Mimura Masato、Sakai Shinsuke、Kawahara Tatsuya
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 42 Pages: 333～343
- DOI
  10.1250/ast.42.333
- Peer Reviewed / Open Access
[Presentation] An end-to-end model from speech to clean transcript for parliamentary meetings2021
- Author(s)
  M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  APSIPA ASC
- Int'l Joint Research
[Presentation] ASR rescoring and confidence estimation with ELECTRA2021
- Author(s)
  H.Futami, H.Inaguma, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  IEEE Workshop Automatic Speech Recognition & Understanding (ASRU)
- Int'l Joint Research
[Presentation] Data augmentation for ASR using TTS via a discrete representation2021
- Author(s)
  S.Ueno, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  IEEE Workshop Automatic Speech Recognition & Understanding (ASRU)
- Int'l Joint Research
[Presentation] VAD-free streaming hybrid CTC/Attention ASR for unsegmented recording2021
- Author(s)
  H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] StableEmit: Selection probability discount for reducing emission latency of streaming monotonic attention ASR2021
- Author(s)
  H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Multi-referenced training for dialogue response generation2021
- Author(s)
  T.Zhao and T.Kawahara
- Organizer
  SIGdial Meeting Discourse & Dialogue
- Int'l Joint Research

2021 Fiscal Year Annual Research Report

End-to-End Model for Task-Independent Speech Understanding and Dialogue

Principal Investigator

河原 達也 京都大学, 情報学研究科, 教授 (00234104)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition2021

Author(s)

Journal Title

DOI

[Journal Article] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition2021

Author(s)

Journal Title

DOI

[Presentation] An end-to-end model from speech to clean transcript for parliamentary meetings2021

Author(s)

Organizer

[Presentation] ASR rescoring and confidence estimation with ELECTRA2021

Author(s)

Organizer

[Presentation] Data augmentation for ASR using TTS via a discrete representation2021

Author(s)

Organizer

[Presentation] VAD-free streaming hybrid CTC/Attention ASR for unsegmented recording2021

Author(s)

Organizer

[Presentation] StableEmit: Selection probability discount for reducing emission latency of streaming monotonic attention ASR2021

Author(s)

Organizer

[Presentation] Multi-referenced training for dialogue response generation2021

Author(s)

Organizer

河原達也京都大学, 情報学研究科, 教授 (00234104)