識別的特徴抽出と確率モデルに基づく多様な環境・発声変動に頑健な音声認識

Research Project

Project/Area Number	15K16020
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	Nagaoka University of Technology
Principal Investigator	王龍標長岡技術科学大学, 工学(系)研究科(研究院), 准教授 (30510458)
Project Period (FY)	2015-04-01 – 2017-03-31
Project Status	Discontinued (Fiscal Year 2016)
Budget Amount *help	¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000) Fiscal Year 2017: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2016: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords	音声認識 / 深層学習 / 特徴適応
Outline of Annual Research Achievements	本研究は、多様な発話環境・発話スタイル・発話アクセントの音声に対して、環境・発声変動を正規化しながら識別的特徴抽出と確率モデルを一体化する高精度な音声認識法を研究した。具体的には、平成27年度に、（１）多様な環境・発声様式による英語音声データベースの整備、（２）深層学習（Deep Learning）による環境・発声変動の除去・識別的特徴変換の同時最適化に基づく識別的特徴抽出、（３）多様な環境・発話変動などの音声認識への悪影響を軽減するPLDA（確率的線形判別分析）-HMMによる音声認識、を行った。平成28年度では、（１）雑音環境におけるマルチチャンネル特徴適応、（２）アクセントが強い非母国語話者の発話に頑健な音声認識、を行った。（１）について、悪環境下での音声認識率（単語正解精度）を従来の60％程度から実用化レベルの80％を超えた。（２）について、非母国語話者の音声認識の精度改善を目的とし、非母国語話者に対応した音響モデル学習の手法と、深層学習による特徴量変換の手法を提案した。非母国語話者の音声認識は低リソースの条件であるため、音響モデルとして部分空間混合ガウスモデル（SGMM）を利用した。さらにSGMMは異なる種類の音声を学習データとして複数用いた場合に、その差を考慮した学習が可能であるため、母国語話者の音声と非母国語話者の音声の両方を利用する学習方法（cross-accent SGMM）を提案した。また、深層学習を特徴量変換器として利用する手法を提案した。これらの手法について非母国語話者の音声認識実験において評価を行い、認識精度を大幅に改善した。

Report

(2 results)

2016 Annual Research Report
2015 Research-status Report

Research Products
(12 results)

All 2016 2015 Other

All Int'l Joint Research (3 results) Journal Article (5 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 5 results, Open Access: 5 results, Acknowledgement Compliant: 3 results) Presentation (4 results) (of which Int'l Joint Research: 4 results)

[Int'l Joint Research] 南洋理工大学/Institute for Infocomm Research(シンガポール)
- Related Report
  2016 Annual Research Report
[Int'l Joint Research] エジンバラ大学(英国)
- Related Report
  2016 Annual Research Report
[Int'l Joint Research] 清華大学(中国)
- Related Report
  2016 Annual Research Report
[Journal Article] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization2016
- Author(s)
  Yuma Ueda, Longbiao Wang, Atsuhiko Kai, Xiong Xiao, EngSiong Chng, Haizhou Li
- Journal Title
  
  Journal of Signal Processing Systems
  
  Volume: 82 Issue: 2 Pages: 151-161
- DOI
  10.1007/s11265-015-1007-3
- Related Report
  2015 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Environment-dependent denoising autoencoder for distant-talking speech recognition2015
- Author(s)
  Y. Ueda, L. Wang, A. Kai, B. Ren
- Journal Title
  
  Eurasip Journal on Advances in Signal Processing
  
  Volume: 2015:92 Issue: 1 Pages: 1-11
- DOI
  10.1186/s13634-015-0278-y
- Related Report
  2015 Research-status Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Distant-talking accent recognition by combining GMM and DNN2015
- Author(s)
  K. Phapatanaburi, L. Wang, R. Sakagami, Z. Zhang, X. Li, M. Iwahashi
- Journal Title
  
  Multimedia Tools and Applications
  
  Volume: 74 Issue: 9 Pages: 1-16
- DOI
  10.1007/s11042-015-2935-4
- Related Report
  2015 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
[Journal Article] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition2015
- Author(s)
  B. Ren, L. Wang, L. Lu, Y. Ueda, A. Kai
- Journal Title
  
  Multimedia Tools and Applications
  
  Volume: 74 Issue: 9 Pages: 1-16
- DOI
  10.1007/s11042-015-2849-1
- Related Report
  2015 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
[Journal Article] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification2015
- Author(s)
  Z. Zhang, L. Wang, A. Kai, K. Odani, W. Li, M. Iwahashi
- Journal Title
  
  Eurasip Journal on Audio, Music and Speech Processing
  
  Volume: 2015:12 Issue: 1 Pages: 1-13
- DOI
  10.1186/s13636-015-0056-7
- Related Report
  2015 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] DNN-based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification2016
- Author(s)
  Z. OO, Y. Kawakami, L. Wang, S. Nakagawa, X. Xiao, M. Iwahashi
- Organizer
  Interspeech
- Place of Presentation
  San Francisco, USA
- Year and Date
  2016-09-08
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Speech selection and environmental adaptation for asynchronous speech recognition2015
- Author(s)
  Bo Ren, L. Wang, Y. Ueda, A. Kai, Z. Zhang
- Organizer
  APSIPA
- Place of Presentation
  Hong Kong
- Year and Date
  2015-12-16
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] ROBUST SPEECH RECOGNITION USING BEAMFORMING WITH ADAPTIVE MICROPHONE GAINS AND MULTICHANNEL NOISE REDUCTION2015
- Author(s)
  2.Shengkui Zhao, Xiong Xiao, Zhaofeng Zhang, Thi Ngoc Tho Nguyen, Xionghu Zhong, Bo Ren, Longbiao Wang, Douglas L. Jones, Eng Siong Chng, Haizhou Li
- Organizer
  ASRU
- Place of Presentation
  Scottsdale, Arizona, USA
- Year and Date
  2015-12-13
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] Relative phase information for detecting human speech and spoofed speech2015
- Author(s)
  L. Wang Y. Yoshida, Y. Kawakami, S. Nakagawa
- Organizer
  Interspeech
- Place of Presentation
  Dresden, Germany
- Year and Date
  2015-09-06
- Related Report
  2015 Research-status Report
- Int'l Joint Research

識別的特徴抽出と確率モデルに基づく多様な環境・発声変動に頑健な音声認識

Principal Investigator

王 龍標 長岡技術科学大学, 工学(系)研究科(研究院), 准教授 (30510458)

¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)

Report

Research Products

[Int'l Joint Research] 南洋理工大学/Institute for Infocomm Research(シンガポール)

Related Report

[Int'l Joint Research] エジンバラ大学(英国)

Related Report

[Int'l Joint Research] 清華大学(中国)

Related Report

[Journal Article] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Environment-dependent denoising autoencoder for distant-talking speech recognition2015

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Distant-talking accent recognition by combining GMM and DNN2015

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition2015

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification2015

Author(s)

Journal Title

DOI

Related Report

[Presentation] DNN-based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Speech selection and environmental adaptation for asynchronous speech recognition2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] ROBUST SPEECH RECOGNITION USING BEAMFORMING WITH ADAPTIVE MICROPHONE GAINS AND MULTICHANNEL NOISE REDUCTION2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Relative phase information for detecting human speech and spoofed speech2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

王龍標長岡技術科学大学, 工学(系)研究科(研究院), 准教授 (30510458)