• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Automatic acquisition of optimized acoustic model unit for automatic speech recognition using deep learning

Research Project

Project/Area Number 19K12027
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionChubu University

Principal Investigator

Yamamoto Kazumasa  中部大学, 工学部, 教授 (40324230)

Co-Investigator(Kenkyū-buntansha) 西崎 博光  山梨大学, 大学院総合研究部, 准教授 (40362082)
Project Period (FY) 2019-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2019: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords音声認識 / 音響モデル / 深層学習 / モデル単位 / 音響クラスタリング / 多言語 / マルチタスク学習 / End-to-End / 音素モデル / 単音モデル / クラスタリング
Outline of Research at the Start

深層学習の導入により自動音声認識技術は大きく発展し、実用化のステージに入ってきた。しかしながら、英語音声認識と比べて日本語音声認識性能が低いのが現状である。この原因のひとつとして、言語に適した音響モデル単位が使われていないことがあると考えられる。
そこで、本提案研究では、音響モデル単位を最新の深層学習技術を用いて自動獲得することによって日本語音声認識の性能を向上させることを目的とする。具体的には、従来の自動音響モデル単位獲得と深層学習の組み合わせによる高精度化と、多言語単音モデルの日本語音素への自動マッピングを基礎として、これらの組み合わせにより、より良い音響モデル単位の獲得を目指す。

Outline of Final Research Achievements

In this research, we aimed to acquire acoustic model units automatically using the latest deep learning technology in order to improve the performance of Japanese speech recognition. This research is divided into two sub-themes: "(1) automatic acquisition of model units by clustering using deep learning," and "(2) disambiguation of phone-phoneme mapping by using groups of multilingual phone models". In the sub-theme (1), in DNN-HMM acoustic model, recognition accuracy could be improved by state clustering, which cannot be obtained by conventional context-dependent phonetic clustering. In the sub-theme (2), it was found that in multilingual (code-switching) speech recognition, the speech recognition accuracy is improved by performing acoustic modeling that absorbs differences in speakers and languages rather than the phonetic model unit for each language.

Academic Significance and Societal Importance of the Research Achievements

最近の深層学習技術の発展により、自動音声認識の性能は大きく向上し、音声AIアシスタントの入力インタフェースとして広く実用化されるに至った。しかしながら、英語音声認識と比べて日本語音声認識はやや性能が悪く、英語圏に比べて日本語の音声入力システムの活用頻度が低い理由のひとつとなっていると考えられる。本研究では、日本語音声認識システムの基本的な性能向上を目指すことが学術的な意義であり、ディジタルディバイドの影響を受けやすい高齢者に対しても高い音声認識精度を持つ音声入力システムを提供できるようになることが社会的な意義である。

Report

(4 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • 2019 Research-status Report
  • Research Products

    (9 results)

All 2021 2020 2019

All Journal Article (6 results) (of which Peer Reviewed: 6 results,  Open Access: 1 results) Presentation (3 results)

  • [Journal Article] Improvement of Elderly Speech Recognition Using Gammatone Filterbank Adaptation2021

    • Author(s)
      Kazumasa Yamamoto, Akinori Ishiki, Seiichi Nakagawa
    • Journal Title

      Proceedings of 2020 IEEE 10th Global Conference on Consumer Electronics (GCCE)

      Volume: - Pages: 327-328

    • DOI

      10.1109/gcce53005.2021.9622086

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] ExKaldi-RT: A Real-Time Automatic Speech Recognition Extension Toolkit of Kaldi2021

    • Author(s)
      Wang Yu, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, Hiromitsu Nishizaki
    • Journal Title

      Proceedings of 2020 IEEE 10th Global Conference on Consumer Electronics (GCCE)

      Volume: - Pages: 346-350

    • DOI

      10.1109/gcce53005.2021.9621992

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Language and Speaker-Independent Feature Transformation for End-to-End Multilingual Speech Recognition2021

    • Author(s)
      Tomoaki Hayakawa, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, and Hiromitsu Nishizaki
    • Journal Title

      Proceedings of INTERSPEECH2021

      Volume: - Pages: 2431-2435

    • DOI

      10.21437/interspeech.2021-390

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Effectiveness of Fine Linear Frequency Spectral Feature for Acoustic Event Detection2020

    • Author(s)
      Kazumasa Yamamoto, Ryo Yamamoto, Seiichi Nakagawa
    • Journal Title

      Proceedings of 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE)

      Volume: - Pages: 923-924

    • DOI

      10.1109/gcce50665.2020.9291954

    • Related Report
      2020 Research-status Report
    • Peer Reviewed
  • [Journal Article] Audio Classification of Bit-Representation Waveform2019

    • Author(s)
      Okawa Masaki, Saito Takuya, Sawada Naoki, Nishizaki Hiromitsu
    • Journal Title

      Proceedings of the 20th Annual Conference of the International Speech Communicationn Association (INTERSPEECH2019)

      Volume: - Pages: 2553-2557

    • DOI

      10.21437/interspeech.2019-1855

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition2019

    • Author(s)
      Meiko Fukuda, Ryota Nishimura, Hiromitsu Nishizaki, Yurie Iribe, Norihide Kitaoka
    • Journal Title

      Proceedings of the 22nd Conference of the Oriental COCOSDA (International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA 2019)

      Volume: -

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Presentation] 超高齢者音声コーパスEARS における超高齢者の音響的特徴2021

    • Author(s)
      福田芽衣子, 西村良太, 西崎博光, 入部百合絵, 山本一公, 北岡教英
    • Organizer
      日本音響学会2021年秋季研究発表会
    • Related Report
      2021 Annual Research Report
  • [Presentation] End-to-End複数言語音声認識モデルにおける様々なマルチタスク学習の検討2020

    • Author(s)
      早川友瑛,西崎博光,山本一公,小林彰夫,宇津呂武仁
    • Organizer
      日本音響学会 2020年秋季研究発表会
    • Related Report
      2020 Research-status Report
  • [Presentation] Development and Evaluation of Kaldi Extension Tools with Python2019

    • Author(s)
      Yu Wang, Hiromitsu Nishizaki , Akio Kobayashi , Takehito Utsuro, Chee Siang Leow
    • Organizer
      情報処理学会,音声言語情報処理研究会, 2019-SLP-130(5)
    • Related Report
      2019 Research-status Report

URL: 

Published: 2019-04-18   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi