2020 Fiscal Year Annual Research Report

階層的Ｅｎｄ－ｔｏ－Ｅｎｄモデルに基づく音声対話における心的状態推定に関する研究

Research Project

Project/Area Number	18J22864
Research Institution	Kyoto University
Principal Investigator	稲熊寛文京都大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY)	2018-04-25 – 2021-03-31
Keywords	End-to-end音声認識 / ストリーミング音声認識 / End-to-end音声翻訳 / 非自己回帰モデル / 知識蒸留
Outline of Annual Research Achievements	昨年度に引き続き，話者が発話を終了するのを待たずにリアルタイムで動作するオンラインストリーミング音声認識の研究に取り組んだ．Monotonic chunkwise attention (MoChA)というストリーミングEnd-to-end音声認識モデルが推論時に単語を出力するタイミングが実際に対応する音声が発せられたタイミングよりも遅延するという問題に着目した．このレイテンシを削減するため，connectionist temporal classification (CTC)というモデルから得られるアライメント情報を使ってレイテンシを削減する「CTC同期学習」という手法を提案した．その成果はInterspeech2020に採択され，さらにジャーナル論文としてまとめて投稿した．また，End-to-end音声翻訳のモデルの推論速度を高速化するため，非自己回帰型モデルの研究にも取り組んだ．精度は高いが推論速度が遅い自己回帰モデルと精度は低いが推論速度が速い非自己回帰型モデルの欠点を補完するため，後者から高速に得られる出力を前者でリスコアリングする手法を提案し，ICASSP2021に採択された．また2つのテキストベースの機械翻訳モデルを使ってソース言語とターゲット言語の両方から得られる知識を1つのend-to-end音声翻訳モデルに蒸留する手法を提案し，自然言語処理のトップカンファレンスであるNAACL-HLT2021に採択された．
Research Progress Status	令和2年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和2年度が最終年度であるため、記入しない。

Research Products
(15 results)

All 2021 2020 Other

All Presentation (14 results) (of which Int'l Joint Research: 10 results) Remarks (1 results)

[Presentation] Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder2021
- Author(s)
  Hirofumi Inaguma
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
- Int'l Joint Research
[Presentation] Improved Mask-CTC for Non-Autoregressive End-to-End ASR2021
- Author(s)
  Yosuke Higuchi
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
- Int'l Joint Research
[Presentation] Recent Developments on ESPnet Toolkit Boosted by Conformer2021
- Author(s)
  Pengcheng Guo
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
- Int'l Joint Research
[Presentation] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation2021
- Author(s)
  Hirofumi Inaguma
- Organizer
  2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021)
- Int'l Joint Research
[Presentation] BERTによるSequence-to-Sequence音声認識への知識蒸留2021
- Author(s)
  二見颯
- Organizer
  第246回自然言語処理・第134回音声言語情報処理合同研究発表会
[Presentation] CTCとマスク推定に基づく推論速度の速いEnd-to-End音声認識2021
- Author(s)
  樋口陽祐
- Organizer
  第246回自然言語処理・第134回音声言語情報処理合同研究発表会
[Presentation] ELECTRA による音声認識仮説のリスコアリング2021
- Author(s)
  二見颯
- Organizer
  日本音響学会2021年春季研究発表会
[Presentation] MINIMUM LATENCY TRAINING STRATEGIES FOR STREAMING SEQUENCE-TO-SEQUENCE ASR2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)
- Int'l Joint Research
[Presentation] ESPnet-ST: All-in-One Speech Translation Toolkit2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  The 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations, 2020
- Int'l Joint Research
[Presentation] CTC-synchronous Training for Monotonic Attention Model2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  Interspeech 2020
- Int'l Joint Research
[Presentation] Enhancing Monotonic Multihead Atteniton for Streaming ASR2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  Interspeech 2020
- Int'l Joint Research
[Presentation] Distilling the Knowledge of BERT for Sequence-to-Sequence ASR2020
- Author(s)
  Hayato Futami
- Organizer
  Interspeech 2020
- Int'l Joint Research
[Presentation] End-to-end speech-to-dialog-act recognition2020
- Author(s)
  Tatusya Kawahara
- Organizer
  Interspeech 2020
- Int'l Joint Research
[Presentation] CTC同期学習による注意機構型ストリーミング音声認識の改善2020
- Author(s)
  稲熊寛文
- Organizer
  日本音響学会2020年秋季研究発表会
[Remarks] 個人HP
- URL
  https://hirofumi0810.github.io/

2020 Fiscal Year Annual Research Report

階層的Ｅｎｄ－ｔｏ－Ｅｎｄモデルに基づく音声対話における心的状態推定に関する研究

Principal Investigator

稲熊 寛文 京都大学, 情報学研究科, 特別研究員(DC1)

Research Products

[Presentation] Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder2021

Author(s)

Organizer

[Presentation] Improved Mask-CTC for Non-Autoregressive End-to-End ASR2021

Author(s)

Organizer

[Presentation] Recent Developments on ESPnet Toolkit Boosted by Conformer2021

Author(s)

Organizer

[Presentation] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation2021

Author(s)

Organizer

[Presentation] BERTによるSequence-to-Sequence音声認識への知識蒸留2021

Author(s)

Organizer

[Presentation] CTCとマスク推定に基づく推論速度の速いEnd-to-End音声認識2021

Author(s)

Organizer

[Presentation] ELECTRA による音声認識仮説のリスコアリング2021

Author(s)

Organizer

[Presentation] MINIMUM LATENCY TRAINING STRATEGIES FOR STREAMING SEQUENCE-TO-SEQUENCE ASR2020

Author(s)

Organizer

[Presentation] ESPnet-ST: All-in-One Speech Translation Toolkit2020

Author(s)

Organizer

[Presentation] CTC-synchronous Training for Monotonic Attention Model2020

Author(s)

Organizer

[Presentation] Enhancing Monotonic Multihead Atteniton for Streaming ASR2020

Author(s)

Organizer

[Presentation] Distilling the Knowledge of BERT for Sequence-to-Sequence ASR2020

Author(s)

Organizer

[Presentation] End-to-end speech-to-dialog-act recognition2020

Author(s)

Organizer

[Presentation] CTC同期学習による注意機構型ストリーミング音声認識の改善2020

Author(s)

Organizer

[Remarks] 個人HP

URL

稲熊寛文京都大学, 情報学研究科, 特別研究員(DC1)