• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

識別的特徴抽出と確率モデルに基づく多様な環境・発声変動に頑健な音声認識

Research Project

Project/Area Number 15K16020
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Perceptual information processing
Research InstitutionNagaoka University of Technology

Principal Investigator

王 龍標  長岡技術科学大学, 工学(系)研究科(研究院), 准教授 (30510458)

Project Period (FY) 2015-04-01 – 2017-03-31
Project Status Discontinued (Fiscal Year 2016)
Budget Amount *help
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2017: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2016: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords音声認識 / 深層学習 / 特徴適応
Outline of Annual Research Achievements

本研究は、多様な発話環境・発話スタイル・発話アクセントの音声に対して、環境・発声変動を正規化しながら識別的特徴抽出と確率モデルを一体化する高精度な音声認識法を研究した。具体的には、平成27年度に、(1)多様な環境・発声様式による英語音声データベースの整備、(2)深層学習(Deep Learning)による環境・発声変動の除去・識別的特徴変換の同時最適化に基づく識別的特徴抽出、(3)多様な環境・発話変動などの音声認識への悪影響を軽減するPLDA(確率的線形判別分析)-HMMによる音声認識、を行った。
平成28年度では、(1)雑音環境におけるマルチチャンネル特徴適応、(2)アクセントが強い非母国語話者の発話に頑健な音声認識、を行った。(1)について、悪環境下での音声認識率(単語正解精度)を従来の60%程度から実用化レベルの80%を超えた。(2)について、非母国語話者の音声認識の精度改善を目的とし、非母国語話者に対応した音響モデル学習の手法と、深層学習による特徴量変換の手法を提案した。非母国語話者の音声認識は低リソースの条件であるため、音響モデルとして部分空間混合ガウスモデル(SGMM)を利用した。さらにSGMMは異なる種類の音声を学習データとして複数用いた場合に、その差を考慮した学習が可能であるため、母国語話者の音声と非母国語話者の音声の両方を利用する学習方法(cross-accent SGMM)を提案した。また、深層学習を特徴量変換器として利用する手法を提案した。これらの手法について非母国語話者の音声認識実験において評価を行い、認識精度を大幅に改善した。

Report

(2 results)
  • 2016 Annual Research Report
  • 2015 Research-status Report
  • Research Products

    (12 results)

All 2016 2015 Other

All Int'l Joint Research (3 results) Journal Article (5 results) (of which Int'l Joint Research: 4 results,  Peer Reviewed: 5 results,  Open Access: 5 results,  Acknowledgement Compliant: 3 results) Presentation (4 results) (of which Int'l Joint Research: 4 results)

  • [Int'l Joint Research] 南洋理工大学/Institute for Infocomm Research(シンガポール)

    • Related Report
      2016 Annual Research Report
  • [Int'l Joint Research] エジンバラ大学(英国)

    • Related Report
      2016 Annual Research Report
  • [Int'l Joint Research] 清華大学(中国)

    • Related Report
      2016 Annual Research Report
  • [Journal Article] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization2016

    • Author(s)
      Yuma Ueda, Longbiao Wang, Atsuhiko Kai, Xiong Xiao, EngSiong Chng, Haizhou Li
    • Journal Title

      Journal of Signal Processing Systems

      Volume: 82 Issue: 2 Pages: 151-161

    • DOI

      10.1007/s11265-015-1007-3

    • Related Report
      2015 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Environment-dependent denoising autoencoder for distant-talking speech recognition2015

    • Author(s)
      Y. Ueda, L. Wang, A. Kai, B. Ren
    • Journal Title

      Eurasip Journal on Advances in Signal Processing

      Volume: 2015:92 Issue: 1 Pages: 1-11

    • DOI

      10.1186/s13634-015-0278-y

    • Related Report
      2015 Research-status Report
    • Peer Reviewed / Open Access / Acknowledgement Compliant
  • [Journal Article] Distant-talking accent recognition by combining GMM and DNN2015

    • Author(s)
      K. Phapatanaburi, L. Wang, R. Sakagami, Z. Zhang, X. Li, M. Iwahashi
    • Journal Title

      Multimedia Tools and Applications

      Volume: 74 Issue: 9 Pages: 1-16

    • DOI

      10.1007/s11042-015-2935-4

    • Related Report
      2015 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
  • [Journal Article] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition2015

    • Author(s)
      B. Ren, L. Wang, L. Lu, Y. Ueda, A. Kai
    • Journal Title

      Multimedia Tools and Applications

      Volume: 74 Issue: 9 Pages: 1-16

    • DOI

      10.1007/s11042-015-2849-1

    • Related Report
      2015 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
  • [Journal Article] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification2015

    • Author(s)
      Z. Zhang, L. Wang, A. Kai, K. Odani, W. Li, M. Iwahashi
    • Journal Title

      Eurasip Journal on Audio, Music and Speech Processing

      Volume: 2015:12 Issue: 1 Pages: 1-13

    • DOI

      10.1186/s13636-015-0056-7

    • Related Report
      2015 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] DNN-based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification2016

    • Author(s)
      Z. OO, Y. Kawakami, L. Wang, S. Nakagawa, X. Xiao, M. Iwahashi
    • Organizer
      Interspeech
    • Place of Presentation
      San Francisco, USA
    • Year and Date
      2016-09-08
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Speech selection and environmental adaptation for asynchronous speech recognition2015

    • Author(s)
      Bo Ren, L. Wang, Y. Ueda, A. Kai, Z. Zhang
    • Organizer
      APSIPA
    • Place of Presentation
      Hong Kong
    • Year and Date
      2015-12-16
    • Related Report
      2015 Research-status Report
    • Int'l Joint Research
  • [Presentation] ROBUST SPEECH RECOGNITION USING BEAMFORMING WITH ADAPTIVE MICROPHONE GAINS AND MULTICHANNEL NOISE REDUCTION2015

    • Author(s)
      2.Shengkui Zhao, Xiong Xiao, Zhaofeng Zhang, Thi Ngoc Tho Nguyen, Xionghu Zhong, Bo Ren, Longbiao Wang, Douglas L. Jones, Eng Siong Chng, Haizhou Li
    • Organizer
      ASRU
    • Place of Presentation
      Scottsdale, Arizona, USA
    • Year and Date
      2015-12-13
    • Related Report
      2015 Research-status Report
    • Int'l Joint Research
  • [Presentation] Relative phase information for detecting human speech and spoofed speech2015

    • Author(s)
      L. Wang Y. Yoshida, Y. Kawakami, S. Nakagawa
    • Organizer
      Interspeech
    • Place of Presentation
      Dresden, Germany
    • Year and Date
      2015-09-06
    • Related Report
      2015 Research-status Report
    • Int'l Joint Research

URL: 

Published: 2015-04-16   Modified: 2022-02-16  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi