実用性の高いEnd-to-End音声認識に向けた研究

Research Project

Project/Area Number	22KJ2898
Project/Area Number (Other)	21J23495 (2021-2022)
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Multi-year Fund (2023) Single-year Grants (2021-2022)
Section	国内
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Waseda University
Principal Investigator	樋口陽祐早稲田大学, 理工学術院, 特別研究員(DC1)
Project Period (FY)	2023-03-08 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥2,200,000 (Direct Cost: ¥2,200,000) Fiscal Year 2023: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2022: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2021: ¥800,000 (Direct Cost: ¥800,000)
Keywords	音声認識 / 自然言語処理
Outline of Research at the Start	継続課題のため、記入しない。
Outline of Annual Research Achievements	音声を介したインターフェースの実用性を高めるために、高速かつ高精度に動作可能な音声認識システムの開発を目指している。これまでの研究では、マスク言語モデルに基づいた非自己回帰型のEnd-to-End音声認識システムを構築し、従来の自己回帰型システムと比べて遜色ない認識精度を維持しつつ、推論速度の大幅な改善を達成してきた。また、提案システムの応用として、事前学習されたマスク言語モデルを用いることや、ストリーミング音声認識に拡張することを検討し、精度および機能性のさらなる向上についても有望な結果を得ている。本年度は、自然言語処理分野で急激に発展している生成型言語モデルに着目し、これまでに得られた成果を基盤として、新たな音声認識技術の開発に取り組んだ。本年度は、生成型言語モデルから得られる汎用的な言語知識を活用して、End-to-End音声認識システムの性能を向上することを試みた。ChatGPTといった最新の生成型言語モデルは、自然言語による指示を含むプロンプトを通じてファインチューニングすることで、様々な自然言語処理タスクに対して高い汎化性能を示している。本研究では、生成型言語モデルに音声認識仮説の文法誤り訂正タスクを解かせ、そこから得られる特徴表現を用いてEnd-to-End音声認識モデルにおける系列生成を学習した。複数の音声認識用データセットを用いた評価実験の結果、提案方式によって高い認識精度が達成できることを確認した。その一方で、大規模な生成型言語モデルを用いることによる推論速度の低下も課題として明らかとなった。当該成果は、査読付きの国際会議に投稿中である。上記の主要な成果の他にも、ストリーミング音声認識への拡張や高速な推論アルゴリズムの開発にも取り組んだ。これら成果は共著論文として国際会議に採択されている。

Report

(3 results)

Research Products
(27 results)

All 2024 2023 2022 2021

All Presentation (27 results) (of which Int'l Joint Research: 20 results)

[Presentation] 再帰的フィードバックを用いた階層的マルチタスク学習によるEnd-to-End音声認識2024
- Author(s)
  楠奈穂美
- Organizer
  2024年日本音響学会春季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] CTC Alignments Improve Autoregressive Translation2023
- Author(s)
  Brian Yan
- Organizer
  Proc. EACL2023
- Related Report
  2023 Annual Research Report 2022 Annual Research Report
- Int'l Joint Research
[Presentation] InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss2023
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. ICASSP2023
- Related Report
  2023 Annual Research Report 2022 Annual Research Report
- Int'l Joint Research
[Presentation] BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder2023
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. ICASSP2023
- Related Report
  2023 Annual Research Report 2022 Annual Research Report
- Int'l Joint Research
[Presentation] Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition2023
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. EUSIPCO2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder2023
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. ASRU2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference2023
- Author(s)
  Masao Someki
- Organizer
  Proc. ASRU2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] 事前学習済みマスク言語モデルを用いたEnd-to-End音声認識2023
- Author(s)
  樋口陽祐
- Organizer
  2023年日本音響学会秋季研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding2023
- Author(s)
  Yifan Peng
- Organizer
  Proc. SLT2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy2022
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. ICASSP2022
- Related Report
  2022 Annual Research Report 2021 Annual Research Report
- Int'l Joint Research
[Presentation] Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units2022
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. ICASSP2022
- Related Report
  2022 Annual Research Report 2021 Annual Research Report
- Int'l Joint Research
[Presentation] Improving Non-Autoregressive End-to-End Speech Recognition with Pre-trained Acoustic and Language Models2022
- Author(s)
  Keqi Deng
- Organizer
  Proc. ICASSP2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] ESPnet-ONNX: Bridging a Gap Between Research and Production2022
- Author(s)
  Masao Someki
- Organizer
  Proc. APSIPA2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model2022
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. Findings EMNLP2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Transducer型ストリーミング音声認識におけるMask-CTCを用いた事前学習2022
- Author(s)
  趙懐博
- Organizer
  第142回音声言語情報処理研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] Improving Non-Autoregressive End-to-End Speech Recognition with Pre-trained Acoustic and Language Models,2022
- Author(s)
  Keqi Deng
- Organizer
  Proc. ICASSP2022
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Momentum Pseudo-Labelingによる半教師ありEnd-to-End音声認識2022
- Author(s)
  樋口陽祐
- Organizer
  2022年日本音響学会秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] 粒度の異なるサブワード単位に基づく階層的条件付きEnd-to-End音声認識2022
- Author(s)
  樋口陽祐
- Organizer
  2022年日本音響学会秋季研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] Improved Mask-CTC for Non-Autoregressive End-to-End ASR2021
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. ICASSP2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder2021
- Author(s)
  Hirofumi Inaguma
- Organizer
  Proc. ICASSP2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Recent Developments on ESPnet Toolkit Boosted by Conformer2021
- Author(s)
  Pengcheng Guo
- Organizer
  Proc. ICASSP2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans2021
- Author(s)
  Shinji Watanabe
- Organizer
  Proc. DSLW2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition2021
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. Interspeech2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR2021
- Author(s)
  Huaibo Zhao
- Organizer
  Proc. APSIPA2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation2021
- Author(s)
  Yosuke Higuchi
- Organizer
  Proc. ASRU2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Triggered attention型ストリーミング音声認識におけるMask-CTCを用いた事前学習2021
- Author(s)
  チョウカイハク
- Organizer
  第138回音声言語情報処理研究発表会
- Related Report
  2021 Annual Research Report
[Presentation] End-to-End音声認識のための粒度の異なるサブワード単位に基づく階層的な条件づけ2021
- Author(s)
  樋口陽祐
- Organizer
  第139回音声言語情報処理研究発表会
- Related Report
  2021 Annual Research Report

実用性の高いEnd-to-End音声認識に向けた研究

Principal Investigator

樋口 陽祐 早稲田大学, 理工学術院, 特別研究員(DC1)

¥2,200,000 (Direct Cost: ¥2,200,000)

Report

Research Products

[Presentation] 再帰的フィードバックを用いた階層的マルチタスク学習によるEnd-to-End音声認識2024

Author(s)

Organizer

Related Report

[Presentation] CTC Alignments Improve Autoregressive Translation2023

Author(s)

Organizer

Related Report

[Presentation] InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss2023

Author(s)

Organizer

Related Report

[Presentation] BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder2023

Author(s)

Organizer

Related Report

[Presentation] Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition2023

Author(s)

Organizer

Related Report

[Presentation] Mask-Conformer: Augmenting Conformer with Mask-Predict Decoder2023

Author(s)

Organizer

Related Report

[Presentation] Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference2023

Author(s)

Organizer

Related Report

[Presentation] 事前学習済みマスク言語モデルを用いたEnd-to-End音声認識2023

Author(s)

Organizer

Related Report

[Presentation] A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding2023

Author(s)

Organizer

Related Report

[Presentation] Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy2022

Author(s)

Organizer

Related Report

[Presentation] Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units2022

Author(s)

Organizer

Related Report

[Presentation] Improving Non-Autoregressive End-to-End Speech Recognition with Pre-trained Acoustic and Language Models2022

Author(s)

Organizer

Related Report

[Presentation] ESPnet-ONNX: Bridging a Gap Between Research and Production2022

Author(s)

Organizer

Related Report

[Presentation] BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model2022

Author(s)

Organizer

Related Report

[Presentation] Transducer型ストリーミング音声認識におけるMask-CTCを用いた事前学習2022

Author(s)

Organizer

Related Report

[Presentation] Improving Non-Autoregressive End-to-End Speech Recognition with Pre-trained Acoustic and Language Models,2022

Author(s)

Organizer

Related Report

[Presentation] Momentum Pseudo-Labelingによる半教師ありEnd-to-End音声認識2022

Author(s)

Organizer

Related Report

[Presentation] 粒度の異なるサブワード単位に基づく階層的条件付きEnd-to-End音声認識2022

Author(s)

Organizer

Related Report

[Presentation] Improved Mask-CTC for Non-Autoregressive End-to-End ASR2021

Author(s)

樋口陽祐早稲田大学, 理工学術院, 特別研究員(DC1)