階層的Ｅｎｄ－ｔｏ－Ｅｎｄモデルに基づく音声対話における心的状態推定に関する研究

Research Project

Project/Area Number	18J22864
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Perceptual information processing
Research Institution	Kyoto University
Principal Investigator	稲熊寛文京都大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY)	2018-04-25 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥2,200,000 (Direct Cost: ¥2,200,000) Fiscal Year 2020: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2019: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2018: ¥800,000 (Direct Cost: ¥800,000)
Keywords	End-to-end音声認識 / ストリーミング音声認識 / End-to-end音声翻訳 / 非自己回帰モデル / 知識蒸留 / ストリーミングend-to-end音声認識 / end-to-end音声翻訳 / 音声認識 / Acoustic-to-word / End-to-End音声認識 / 言語モデル / 未知語問題
Outline of Annual Research Achievements	昨年度に引き続き，話者が発話を終了するのを待たずにリアルタイムで動作するオンラインストリーミング音声認識の研究に取り組んだ．Monotonic chunkwise attention (MoChA)というストリーミングEnd-to-end音声認識モデルが推論時に単語を出力するタイミングが実際に対応する音声が発せられたタイミングよりも遅延するという問題に着目した．このレイテンシを削減するため，connectionist temporal classification (CTC)というモデルから得られるアライメント情報を使ってレイテンシを削減する「CTC同期学習」という手法を提案した．その成果はInterspeech2020に採択され，さらにジャーナル論文としてまとめて投稿した．また，End-to-end音声翻訳のモデルの推論速度を高速化するため，非自己回帰型モデルの研究にも取り組んだ．精度は高いが推論速度が遅い自己回帰モデルと精度は低いが推論速度が速い非自己回帰型モデルの欠点を補完するため，後者から高速に得られる出力を前者でリスコアリングする手法を提案し，ICASSP2021に採択された．また2つのテキストベースの機械翻訳モデルを使ってソース言語とターゲット言語の両方から得られる知識を1つのend-to-end音声翻訳モデルに蒸留する手法を提案し，自然言語処理のトップカンファレンスであるNAACL-HLT2021に採択された．
Research Progress Status	令和2年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和2年度が最終年度であるため、記入しない。

Report

(3 results)

Research Products
(26 results)

All 2021 2020 2019 2018 Other

All Int'l Joint Research (1 results) Presentation (24 results) (of which Int'l Joint Research: 20 results) Remarks (1 results)

[Int'l Joint Research] Johns Hopkins University(米国)
- Related Report
  2018 Annual Research Report
[Presentation] Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder2021
- Author(s)
  Hirofumi Inaguma
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Improved Mask-CTC for Non-Autoregressive End-to-End ASR2021
- Author(s)
  Yosuke Higuchi
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Recent Developments on ESPnet Toolkit Boosted by Conformer2021
- Author(s)
  Pengcheng Guo
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation2021
- Author(s)
  Hirofumi Inaguma
- Organizer
  2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] BERTによるSequence-to-Sequence音声認識への知識蒸留2021
- Author(s)
  二見颯
- Organizer
  第246回自然言語処理・第134回音声言語情報処理合同研究発表会
- Related Report
  2020 Annual Research Report
[Presentation] CTCとマスク推定に基づく推論速度の速いEnd-to-End音声認識2021
- Author(s)
  樋口陽祐
- Organizer
  第246回自然言語処理・第134回音声言語情報処理合同研究発表会
- Related Report
  2020 Annual Research Report
[Presentation] ELECTRA による音声認識仮説のリスコアリング2021
- Author(s)
  二見颯
- Organizer
  日本音響学会2021年春季研究発表会
- Related Report
  2020 Annual Research Report
[Presentation] MINIMUM LATENCY TRAINING STRATEGIES FOR STREAMING SEQUENCE-TO-SEQUENCE ASR2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)
- Related Report
  2020 Annual Research Report 2019 Annual Research Report
- Int'l Joint Research
[Presentation] ESPnet-ST: All-in-One Speech Translation Toolkit2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  The 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations, 2020
- Related Report
  2020 Annual Research Report 2019 Annual Research Report
- Int'l Joint Research
[Presentation] CTC-synchronous Training for Monotonic Attention Model2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  Interspeech 2020
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Enhancing Monotonic Multihead Atteniton for Streaming ASR2020
- Author(s)
  Hirofumi Inaguma
- Organizer
  Interspeech 2020
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Distilling the Knowledge of BERT for Sequence-to-Sequence ASR2020
- Author(s)
  Hayato Futami
- Organizer
  Interspeech 2020
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] End-to-end speech-to-dialog-act recognition2020
- Author(s)
  Tatusya Kawahara
- Organizer
  Interspeech 2020
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] CTC同期学習による注意機構型ストリーミング音声認識の改善2020
- Author(s)
  稲熊寛文
- Organizer
  日本音響学会2020年秋季研究発表会
- Related Report
  2020 Annual Research Report
[Presentation] A Comparative Study on Transformer vs RNN in Speech Applications2020
- Author(s)
  Shigeki Karita
- Organizer
  IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION2019
- Author(s)
  Hirofumi Inaguma
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] MULTILINGUAL END-TO-END SPEECH TRANSLATION2019
- Author(s)
  Hirofumi Inaguma
- Organizer
  IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] LANGUAGE MODEL INTEGRATION BASED ON MEMORY CONTROL FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION2019
- Author(s)
  Jaejin Cho
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION2019
- Author(s)
  Hirofumi Inaguma
- Organizer
  EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] LANGUAGE MODEL INTEGRATION BASED ON MEMORY CONTROL FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION2019
- Author(s)
  Jaejin Cho
- Organizer
  EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] IMPROVING OOV DETECTION AND RESOLUTION WITH EXTERNAL LANGUAGE MODELS IN ACOUSTIC-TO-WORD ASR2018
- Author(s)
  Hirofumi Inaguma
- Organizer
  IEEE Workshop on Spoken Language Technology (SLT2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] LEVERAGING SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS FOR ENHANCING ACOUSTIC-TO-WORD SPEECH RECOGNITION2018
- Author(s)
  Masato Mimura
- Organizer
  IEEE Workshop on Spoken Language Technology (SLT2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION2018
- Author(s)
  Hirofumi Inaguma
- Organizer
  EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] ACOUSTIC-TO-WORD ATTENTION-BASED MODEL COMPLEMENTED WITH CHARACTER-LEVEL CTC-BASED MODEL2018
- Author(s)
  Sei Ueno
- Organizer
  EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Remarks] 個人HP
- URL
  https://hirofumi0810.github.io/
- Related Report
  2020 Annual Research Report

階層的Ｅｎｄ－ｔｏ－Ｅｎｄモデルに基づく音声対話における心的状態推定に関する研究

Principal Investigator

稲熊 寛文 京都大学, 情報学研究科, 特別研究員(DC1)

¥2,200,000 (Direct Cost: ¥2,200,000)

Report

Research Products

[Int'l Joint Research] Johns Hopkins University(米国)

Related Report

[Presentation] Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder2021

Author(s)

Organizer

Related Report

[Presentation] Improved Mask-CTC for Non-Autoregressive End-to-End ASR2021

Author(s)

Organizer

Related Report

[Presentation] Recent Developments on ESPnet Toolkit Boosted by Conformer2021

Author(s)

Organizer

Related Report

[Presentation] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation2021

Author(s)

Organizer

Related Report

[Presentation] BERTによるSequence-to-Sequence音声認識への知識蒸留2021

Author(s)

Organizer

Related Report

[Presentation] CTCとマスク推定に基づく推論速度の速いEnd-to-End音声認識2021

Author(s)

Organizer

Related Report

[Presentation] ELECTRA による音声認識仮説のリスコアリング2021

Author(s)

Organizer

Related Report

[Presentation] MINIMUM LATENCY TRAINING STRATEGIES FOR STREAMING SEQUENCE-TO-SEQUENCE ASR2020

Author(s)

Organizer

Related Report

[Presentation] ESPnet-ST: All-in-One Speech Translation Toolkit2020

Author(s)

Organizer

Related Report

[Presentation] CTC-synchronous Training for Monotonic Attention Model2020

Author(s)

Organizer

Related Report

[Presentation] Enhancing Monotonic Multihead Atteniton for Streaming ASR2020

Author(s)

Organizer

Related Report

[Presentation] Distilling the Knowledge of BERT for Sequence-to-Sequence ASR2020

Author(s)

Organizer

Related Report

[Presentation] End-to-end speech-to-dialog-act recognition2020

Author(s)

Organizer

Related Report

[Presentation] CTC同期学習による注意機構型ストリーミング音声認識の改善2020

Author(s)

Organizer

Related Report

[Presentation] A Comparative Study on Transformer vs RNN in Speech Applications2020

Author(s)

Organizer

Related Report

[Presentation] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION2019

Author(s)

Organizer

Related Report

[Presentation] MULTILINGUAL END-TO-END SPEECH TRANSLATION2019

Author(s)

Organizer

Related Report

[Presentation] LANGUAGE MODEL INTEGRATION BASED ON MEMORY CONTROL FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION2019

Author(s)

Organizer

Related Report

稲熊寛文京都大学, 情報学研究科, 特別研究員(DC1)