2022 Fiscal Year Annual Research Report

End-to-End Model for Task-Independent Speech Understanding and Dialogue

Research Project

Project/Area Number	20H00602
Research Institution	Kyoto University
Principal Investigator	河原達也京都大学, 情報学研究科, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	井上昂治京都大学, 情報学研究科, 助教 (10838684) 吉井和佳京都大学, 情報学研究科, 准教授 (20510001)
Project Period (FY)	2020-04-01 – 2024-03-31
Keywords	音声理解 / 音声対話 / 音声認識 / End-to-Endモデル
Outline of Annual Research Achievements	End-to-Endモデルに基づく汎用的な音声理解・対話に関して、音声認識の高度化の観点と対話生成の観点から各々以下の研究を実施した。まず、人間どうしの自然な話し言葉音声から、フィラーや言い誤りの削除・句読点や脱落した助詞の挿入・口語的な表現の修正などの適宜必要な編集を行いながら、直接可読性の高い書き言葉スタイルの文を出力するEnd-to-Endモデルを設計・構築した。その際に、音声に忠実な書き起こしを疑似的に復元してEnd-to-Endモデルの学習を補助する手法と、句読点位置を手がかりとした音声区分化手法も併せて提案し、各々の効果を示した。衆議院審議音声を用いた評価実験により、提案手法は音声認識とテキストベースの話し言葉スタイル変換を組み合わせたカスケード型のアプローチより高精度かつ高速に会議録テキストを生成できることを確認した。次に、ユーザの入力発話からシステムの応答を生成するEnd-to-End(Seq-to-Seq)モデルにおいて、感情認識を統合するとともに、応答から入力発話を復元するモデルも統合学習することで、文脈理解と感情認識を伴った応答生成の実現を図った。感情認識と検索型の応答を組み合わせることで、共感的な対話が実現できることを確認した。さらに、音声認識モデルにおける自己教師付き学習の導入についても検討し、音声認識と言語認識・ドメイン認識を同時にEnd-to-Endモデルで行い、かつ後者の認識結果を利用することで、音声認識の精度が改善されることを示した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 音声認識・理解及び対話システムにおいて、多面的に研究を実施し、着実に成果を挙げることができた。
Strategy for Future Research Activity	要素技術をさらに発展させるとともに、システムとして統合していく。

Research Products
(12 results)

All 2023 2022

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 2 results) Presentation (8 results) (of which Int'l Joint Research: 8 results) Book (2 results)

[Journal Article] End-to-End Generation of Written-style Transcript of Speech from Parliamentary Meetings2023
- Author(s)
  Mimura Masato、Kawahara Tatsuya
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 30 Pages: 88～124
- DOI
  10.5715/jnlp.30.88
- Peer Reviewed / Open Access
[Journal Article] TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies2022
- Author(s)
  Soky Kak、Mimura Masato、Kawahara Tatsuya、Chu Chenhui、Li Sheng、Ding Chenchen、Sam Sethserey
- Journal Title
  
  International Journal of Asian Language Processing
  
  Volume: 31 Pages: 1--21
- DOI
  10.1142/S2717554522500072
- Peer Reviewed / Open Access
[Presentation] Fusing multiple bandwidth spectrograms for improving speech enhancement.2022
- Author(s)
  H.Shi, Y.Shu, L.Wang, J.Dang, and T.Kawahara.
- Organizer
  APSIPA ASC
- Int'l Joint Research
[Presentation] Subband-based spectrogram fusion for speech enhancement by combining mapping and masking approaches.2022
- Author(s)
  H.Shi, L.Wang, S.Li, J.Dang, and T.Kawahara.
- Organizer
  APSIPA ASC
- Int'l Joint Research
[Presentation] Non-autoregressive error correction for CTC-based ASR with phone-conditioned masked LM.2022
- Author(s)
  H.Futami, H.Inaguma, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] End-to-end speech-to-punctuated-text recognition.2022
- Author(s)
  J.Nozaki, T.Kawahara, K.Ishizuka, and T.Hashimoto.
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Leveraging simultaneous translation for enhancing transcription of low-resource language via cross attention mechanism.2022
- Author(s)
  K.Soky, S.Li, M.Mimura, C.Chu, and T.Kawahara.
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Monaural speech enhancement based on spectrogram decomposition for convolutional neural network-sensitive feature extraction.2022
- Author(s)
  H.Shi, L.Wang, S.Li, J.Dang, and T.Kawahara.
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Selective multi-task learning for speech emotion recognition using corpora of different styles.2022
- Author(s)
  H.Zhang, M.Mimura, T.Kawahara, and K.Ishizuka.
- Organizer
  IEEE-ICASSP
- Int'l Joint Research
[Presentation] Phone-informed refinement of synthesized mel spectrogram for data augmentation in speech recognition.2022
- Author(s)
  S.Ueno and T.Kawahara.
- Organizer
  IEEE-ICASSP
- Int'l Joint Research
[Book] 音声（下）2022
- Author(s)
  日本音響学会、岩野公司、河原達也、篠田浩一、伊藤彰則、増村亮、小川哲司、駒谷和範
- Total Pages
  208
- Publisher
  コロナ社
- ISBN
  978-4-339-01367-2
[Book] 音声対話システム2022
- Author(s)
  井上昂治、河原達也
- Total Pages
  272
- Publisher
  オーム社
- ISBN
  978-4-274-22954-1

2022 Fiscal Year Annual Research Report

End-to-End Model for Task-Independent Speech Understanding and Dialogue

Principal Investigator

河原 達也 京都大学, 情報学研究科, 教授 (00234104)

Current Status of Research Progress

Reason

Research Products

[Journal Article] End-to-End Generation of Written-style Transcript of Speech from Parliamentary Meetings2023

Author(s)

Journal Title

DOI

[Journal Article] TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies2022

Author(s)

Journal Title

DOI

[Presentation] Fusing multiple bandwidth spectrograms for improving speech enhancement.2022

Author(s)

Organizer

[Presentation] Subband-based spectrogram fusion for speech enhancement by combining mapping and masking approaches.2022

Author(s)

Organizer

[Presentation] Non-autoregressive error correction for CTC-based ASR with phone-conditioned masked LM.2022

Author(s)

Organizer

[Presentation] End-to-end speech-to-punctuated-text recognition.2022

Author(s)

Organizer

[Presentation] Leveraging simultaneous translation for enhancing transcription of low-resource language via cross attention mechanism.2022

Author(s)

Organizer

[Presentation] Monaural speech enhancement based on spectrogram decomposition for convolutional neural network-sensitive feature extraction.2022

Author(s)

Organizer

[Presentation] Selective multi-task learning for speech emotion recognition using corpora of different styles.2022

Author(s)

Organizer

[Presentation] Phone-informed refinement of synthesized mel spectrogram for data augmentation in speech recognition.2022

Author(s)

Organizer

[Book] 音声（下）2022

Author(s)

Total Pages

Publisher

ISBN

[Book] 音声対話システム2022

Author(s)

Total Pages

Publisher

ISBN

河原達也京都大学, 情報学研究科, 教授 (00234104)