• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

階層的End-to-Endモデルに基づく音声対話における心的状態推定に関する研究

Research Project

Project/Area Number 18J22864
Research Category

Grant-in-Aid for JSPS Fellows

Allocation TypeSingle-year Grants
Section国内
Research Field Perceptual information processing
Research InstitutionKyoto University
Research Fellow 稲熊 寛文  京都大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY) 2018-04-25 – 2021-03-31
Project Status Completed (Fiscal Year 2020)
Budget Amount *help
¥2,200,000 (Direct Cost: ¥2,200,000)
Fiscal Year 2020: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 2019: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 2018: ¥800,000 (Direct Cost: ¥800,000)
KeywordsEnd-to-end音声認識 / ストリーミング音声認識 / End-to-end音声翻訳 / 非自己回帰モデル / 知識蒸留 / ストリーミングend-to-end音声認識 / end-to-end音声翻訳 / 音声認識 / Acoustic-to-word / End-to-End音声認識 / 言語モデル / 未知語問題
Outline of Annual Research Achievements

昨年度に引き続き,話者が発話を終了するのを待たずにリアルタイムで動作するオンラインストリーミング音声認識の研究に取り組んだ.Monotonic chunkwise attention (MoChA)というストリーミングEnd-to-end音声認識モデルが推論時に単語を出力するタイミングが実際に対応する音声が発せられたタイミングよりも遅延するという問題に着目した.このレイテンシを削減するため,connectionist temporal classification (CTC)というモデルから得られるアライメント情報を使ってレイテンシを削減する「CTC同期学習」という手法を提案した.その成果はInterspeech2020に採択され,さらにジャーナル論文としてまとめて投稿した.
また,End-to-end音声翻訳のモデルの推論速度を高速化するため,非自己回帰型モデルの研究にも取り組んだ.精度は高いが推論速度が遅い自己回帰モデルと精度は低いが推論速度が速い非自己回帰型モデルの欠点を補完するため,後者から高速に得られる出力を前者でリスコアリングする手法を提案し,ICASSP2021に採択された.また2つのテキストベースの機械翻訳モデルを使ってソース言語とターゲット言語の両方から得られる知識を1つのend-to-end音声翻訳モデルに蒸留する手法を提案し,自然言語処理のトップカンファレンスであるNAACL-HLT2021に採択された.

Research Progress Status

令和2年度が最終年度であるため、記入しない。

Strategy for Future Research Activity

令和2年度が最終年度であるため、記入しない。

Report

(3 results)
  • 2020 Annual Research Report
  • 2019 Annual Research Report
  • 2018 Annual Research Report

Research Products

(26 results)

All 2021 2020 2019 2018 Other

All Int'l Joint Research (1 results) Presentation (24 results) (of which Int'l Joint Research: 20 results) Remarks (1 results)

  • [Int'l Joint Research] Johns Hopkins University(米国)

    • Related Report
      2018 Annual Research Report
  • [Presentation] Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder2021

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Improved Mask-CTC for Non-Autoregressive End-to-End ASR2021

    • Author(s)
      Yosuke Higuchi
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Recent Developments on ESPnet Toolkit Boosted by Conformer2021

    • Author(s)
      Pengcheng Guo
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation2021

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] BERTによるSequence-to-Sequence音声認識への知識蒸留2021

    • Author(s)
      二見颯
    • Organizer
      第246回自然言語処理・第134回音声言語情報処理合同研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] CTCとマスク推定に基づく推論速度の速いEnd-to-End音声認識2021

    • Author(s)
      樋口陽祐
    • Organizer
      第246回自然言語処理・第134回音声言語情報処理合同研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] ELECTRA による音声認識仮説のリスコアリング2021

    • Author(s)
      二見颯
    • Organizer
      日本音響学会2021年春季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] MINIMUM LATENCY TRAINING STRATEGIES FOR STREAMING SEQUENCE-TO-SEQUENCE ASR2020

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)
    • Related Report
      2020 Annual Research Report 2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] ESPnet-ST: All-in-One Speech Translation Toolkit2020

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      The 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations, 2020
    • Related Report
      2020 Annual Research Report 2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] CTC-synchronous Training for Monotonic Attention Model2020

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      Interspeech 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Enhancing Monotonic Multihead Atteniton for Streaming ASR2020

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      Interspeech 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Distilling the Knowledge of BERT for Sequence-to-Sequence ASR2020

    • Author(s)
      Hayato Futami
    • Organizer
      Interspeech 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] End-to-end speech-to-dialog-act recognition2020

    • Author(s)
      Tatusya Kawahara
    • Organizer
      Interspeech 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] CTC同期学習による注意機構型ストリーミング音声認識の改善2020

    • Author(s)
      稲熊寛文
    • Organizer
      日本音響学会2020年秋季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] A Comparative Study on Transformer vs RNN in Speech Applications2020

    • Author(s)
      Shigeki Karita
    • Organizer
      IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION2019

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] MULTILINGUAL END-TO-END SPEECH TRANSLATION2019

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] LANGUAGE MODEL INTEGRATION BASED ON MEMORY CONTROL FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION2019

    • Author(s)
      Jaejin Cho
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION2019

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] LANGUAGE MODEL INTEGRATION BASED ON MEMORY CONTROL FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION2019

    • Author(s)
      Jaejin Cho
    • Organizer
      EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] IMPROVING OOV DETECTION AND RESOLUTION WITH EXTERNAL LANGUAGE MODELS IN ACOUSTIC-TO-WORD ASR2018

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      IEEE Workshop on Spoken Language Technology (SLT2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] LEVERAGING SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS FOR ENHANCING ACOUSTIC-TO-WORD SPEECH RECOGNITION2018

    • Author(s)
      Masato Mimura
    • Organizer
      IEEE Workshop on Spoken Language Technology (SLT2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION2018

    • Author(s)
      Hirofumi Inaguma
    • Organizer
      EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] ACOUSTIC-TO-WORD ATTENTION-BASED MODEL COMPLEMENTED WITH CHARACTER-LEVEL CTC-BASED MODEL2018

    • Author(s)
      Sei Ueno
    • Organizer
      EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Remarks] 個人HP

    • URL

      https://hirofumi0810.github.io/

    • Related Report
      2020 Annual Research Report

URL: 

Published: 2018-05-01   Modified: 2021-12-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi