2015 Fiscal Year Research-status Report

識別的特徴抽出と確率モデルに基づく多様な環境・発声変動に頑健な音声認識

Research Project

Project/Area Number	15K16020
Research Institution	Nagaoka University of Technology
Principal Investigator	王龍標長岡技術科学大学, 工学(系)研究科(研究院), 准教授 (30510458)
Project Period (FY)	2015-04-01 – 2017-03-31
Keywords	音声認識 / 深層学習
Outline of Annual Research Achievements	実環境の環境変動と発声変動を同時に効率よく除去する手法の確立が非常に重要であり、本研究の目的とする。具体的に、深層学習による特徴ベースの環境・発話変動の正規化、識別しやすい特徴の抽出とPLDA（確率的線形判別分析）によるモデルベースの環境・発話変動の除去、識別的モデルの学習・適応などの利点を統合し、実環境における音声認識の実用化を図る。主な研究実績の概要を以下にまとめる。（１）雑音残響除去：実環境における音声認識のフロントエンド処理としてdenoising autoencoder (DAE) に着目し、従来DAEと後部残響の自動推定を併用することで環境変動に対して頑健な雑音残響抑圧手法を提案した。また、deep neural network (DNN)によるボトルネック特徴を抽出し、実環境に頑健な前処理を提案した。単語誤り率を従来法の25.9％から提案法の16.2％に大幅に改善した。（２）PLDAによる音声認識：PLDA-HMM (Hidden Markov Model)を音響モデルとして実環境下音声認識へ取り組み、GMM-HMM (Gaussian Mixture Model-HMM)より良い性能を示し、環境・発声変動の除去の効果を確認した。さらに、PLDA-HMMによる音響モデルとDNNによる識別特徴の併用により、結果が査ななる良くなった。研究の成果として、国際論文誌論文3本と複数の国際会議と国内会議論文を発表した。
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 当初計画とおり、平成27年度に、（１）多様な環境・発声様式による英語音声データベースの整備、（２）深層学習による環境・発声変動の除去・識別的特徴変換の同時最適化に基づく識別的特徴抽出、(３）PLDA-HMMによる音声認識、を行った。提案法の音声認識結果は従来法より大幅に改善し、提案法の有効性を確認した。また、当初の計画より結果が良くなって、成果が複数の国際論文誌論文と国内外の学会に発表した。以上のことにより、現在までの進捗状況は当初の計画以上に進展している。
Strategy for Future Research Activity	平成28年度以降も継続して英語音声を収録し、識別的特徴抽出と確率モデルを研究する。同時に、（１）音響モデルの自動適応と（２）アクセントが強い非母国語音声認識の研究を行う。具体的には以下の通りである。（１）音響モデルの自動適応：評価時の環境・発声変動と学習時のものが異る場合、評価データに合わせて音響モデルのパラメータを更新（適応）するのは非常に重要である。そこで、本研究は、発話者・環境・発話様式を自動認識し、環境・発声変動に依存する音響モデルのパラメータを更新し、モデルを適応する。なお、様々な変動要因を同時に考慮して、認識率が最大となるように確率モデルの同時適応の研究を行う。さらに、特徴空間の識別線形回帰手法を提案し、新しい変動に特徴が区別しやすいように特徴空間の変換（適応）手法も研究する。（２）アクセントが強い非母語話者の発話に頑健な音声認識：非母語話者と母語話者と比べて、声道長や音韻体系の差などによって、より柔軟な音声認識の仕組みを考えなければならない。そこで、非母語話者の特徴空間を母語話者の特徴空間へ正規化することで、非母語話者の誤発音などを抑制することを考える。具体的には、DNNによる最先端の音声認識技術を発展さて、母語との音韻距離によって非母語話者の特徴空間から母語話者の特徴空間への変換行列を分類し学習して、発話毎に最適な変換を行う。
Causes of Carryover	研究費を効率よく使うため、研究代表者が一部の既存研究設備を活用し、自分で研究用の音声データベースを整備し、次年度使用額が生じた。また、英語論文を投稿する時に、共同研究者による英語の校閲を行うために、当初予定している業者による英文校正代を節約した。
Expenditure Plan for Carryover Budget	Ｈ２７年度で多くの研究成果が出たため、この成果の発表と情報交換を行うために、旅費として使用する予定である。また、より多くの成果を出すために、学生ＲＡを雇用し、研究代表者と一緒に大規模の音声データベースを利用し、様々な環境における音声認識の研究開発を計画している。

Research Products
(8 results)

All 2016 2015

All Journal Article (5 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 5 results, Open Access: 5 results, Acknowledgement Compliant: 3 results) Presentation (3 results) (of which Int'l Joint Research: 3 results)

[Journal Article] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization2016
- Author(s)
  Yuma Ueda, Longbiao Wang, Atsuhiko Kai, Xiong Xiao, EngSiong Chng, Haizhou Li
- Journal Title
  
  Journal of Signal Processing Systems
  
  Volume: 82 Pages: 151-161
- DOI
  10.1007/s11265-015-1007-3
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Environment-dependent denoising autoencoder for distant-talking speech recognition2015
- Author(s)
  Y. Ueda, L. Wang, A. Kai, B. Ren
- Journal Title
  
  Eurasip Journal on Advances in Signal Processing
  
  Volume: 2015:92 Pages: 1-11
- DOI
  10.1186/s13634-015-0278-y
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Distant-talking accent recognition by combining GMM and DNN2015
- Author(s)
  K. Phapatanaburi, L. Wang, R. Sakagami, Z. Zhang, X. Li, M. Iwahashi
- Journal Title
  
  Multimedia Tools and Applications
  
  Volume: 74 Pages: 1-16
- DOI
  DOI 10.1007/s11042-015-2935-4
- Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
[Journal Article] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition2015
- Author(s)
  B. Ren, L. Wang, L. Lu, Y. Ueda, A. Kai
- Journal Title
  
  Multimedia Tools and Applications
  
  Volume: 74 Pages: 1-16
- DOI
  DOI 10.1007/s11042-015-2849-1
- Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
[Journal Article] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification2015
- Author(s)
  Z. Zhang, L. Wang, A. Kai, K. Odani, W. Li, M. Iwahashi
- Journal Title
  
  Eurasip Journal on Audio, Music and Speech Processing
  
  Volume: 2015:12 Pages: 1-13
- DOI
  DOI 10.1186/s13636-015-0056-7
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Speech selection and environmental adaptation for asynchronous speech recognition2015
- Author(s)
  Bo Ren, L. Wang, Y. Ueda, A. Kai, Z. Zhang
- Organizer
  APSIPA
- Place of Presentation
  Hong Kong
- Year and Date
  2015-12-16 – 2015-12-19
- Int'l Joint Research
[Presentation] ROBUST SPEECH RECOGNITION USING BEAMFORMING WITH ADAPTIVE MICROPHONE GAINS AND MULTICHANNEL NOISE REDUCTION2015
- Author(s)
  2.Shengkui Zhao, Xiong Xiao, Zhaofeng Zhang, Thi Ngoc Tho Nguyen, Xionghu Zhong, Bo Ren, Longbiao Wang, Douglas L. Jones, Eng Siong Chng, Haizhou Li
- Organizer
  ASRU
- Place of Presentation
  Scottsdale, Arizona, USA
- Year and Date
  2015-12-13 – 2015-12-17
- Int'l Joint Research
[Presentation] Relative phase information for detecting human speech and spoofed speech2015
- Author(s)
  L. Wang Y. Yoshida, Y. Kawakami, S. Nakagawa
- Organizer
  Interspeech
- Place of Presentation
  Dresden, Germany
- Year and Date
  2015-09-06 – 2015-09-10
- Int'l Joint Research

2015 Fiscal Year Research-status Report

識別的特徴抽出と確率モデルに基づく多様な環境・発声変動に頑健な音声認識

Principal Investigator

王 龍標 長岡技術科学大学, 工学(系)研究科(研究院), 准教授 (30510458)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization2016

Author(s)

Journal Title

DOI

[Journal Article] Environment-dependent denoising autoencoder for distant-talking speech recognition2015

Author(s)

Journal Title

DOI

[Journal Article] Distant-talking accent recognition by combining GMM and DNN2015

Author(s)

Journal Title

DOI

[Journal Article] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition2015

Author(s)

Journal Title

DOI

[Journal Article] Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification2015

Author(s)

Journal Title

DOI

[Presentation] Speech selection and environmental adaptation for asynchronous speech recognition2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] ROBUST SPEECH RECOGNITION USING BEAMFORMING WITH ADAPTIVE MICROPHONE GAINS AND MULTICHANNEL NOISE REDUCTION2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Relative phase information for detecting human speech and spoofed speech2015

Author(s)

Organizer

Place of Presentation

Year and Date

王龍標長岡技術科学大学, 工学(系)研究科(研究院), 准教授 (30510458)