2017 Fiscal Year Annual Research Report

Accurate speech recognition system with deep neural network introducing human auditory characteristic in real environments

Research Project

Project/Area Number	15K00233
Research Institution	Chubu University
Principal Investigator	山本一公中部大学, 工学部, 准教授 (40324230)
Co-Investigator(Kenkyū-buntansha)	中川聖一豊橋技術科学大学, リーディング大学院教育推進機構, 特命教授 (20115893)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	音声認識 / 深層学習 / Deep Neural Network / 聴覚特性 / 音響特徴量 / フィルタバンク / 話者適応
Outline of Annual Research Achievements	現在、音声認識技術において、深層学習（Deep Learning）を用いた音響モデルであるDNN（Deep Neural Network）音響モデルが一般化しつつあり、実用的な音声認識精度が得られるようになってきている。しかしながら、雑音環境下や遠隔発話条件での音声認識性能は未だ十分ではない。本研究の目的は、DNN音響モデル（特に特徴抽出の部分）に人間の聴覚特性を融合させることで、雑音環境下等での音声認識精度改善を得ることである。平成27～28年度は、聴覚特性を考慮した特徴抽出フィルタバンクを深層学習により自動的に学習する手法を提案し、それによって不特定話者に対する音声認識精度の改善や、適応データが豊富にある話者グループに対する適応化の効果を得ることができた。平成29年度は、話者適応手法について検討を行なった。聴覚フィルタバンクがパラメトリックモデルであるため学習すべきパラメータ数が少なく、適応手法に適していると考えられるためである。また、昨年度までは、聴覚フィルタバンクとして、ガウス関数を用いたフィルタバンクのみを用いていたが、聴覚フィルタとして広く用いられているガンマトーンフィルタ関数を用いて同様に自動学習可能な聴覚フィルタバンクを構築した。さらに、ここまでの検討では比較的小さいデータベース（日本音響学会研究用連続音声データベースと新聞記事読み上げ音声コーパスを混合したもの。合計77時間）を用いて評価を行なってきたが、より自然な発話での効果を確かめる目的で、日本語話し言葉コーパス（230時間）を用いた評価も行なった。様々な従来手法と比較したところ、提案手法により適応化データ量が少ない条件下における話者適応化においても認識精度の改善が得られ、特に提案した聴覚フィルタバンクを従来の線形変換を基とした話者適応手法と組み合わせることで効果的であるという結果が得られた。

Research Products
(4 results)

All 2017

All Presentation (4 results) (of which Int'l Joint Research: 3 results)

[Presentation] Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates2017
- Author(s)
  Sahashi Koya, Goto Norioki, Seki Hiroshi, Yamamoto Kazumasa, Akiba Tomoyoshi, Nakagawa Seiichi
- Organizer
  4th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA 2017)
- Int'l Joint Research
[Presentation] Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides2017
- Author(s)
  Tsujimura Shoko, Yamamoto Kazumasa, Nakagawa Seiichi
- Organizer
  INTERSPEECH 2017
- Int'l Joint Research
[Presentation] Detection of overlapping acoustic events based on NMF with shared basis vectors2017
- Author(s)
  Yamamoto Kazumasa, Ishikawa Chikara, Sahashi Koya, Nakagawa Seiichi
- Organizer
  IEEE 6th Global Conference on Consumer Electronics (GCCE 2017)
- Int'l Joint Research
[Presentation] 大規模データベースCSJを用いたDNNに基づくフィルタバンク学習の評価2017
- Author(s)
  関博史、山本一公、秋葉友良、中川聖一
- Organizer
  日本音響学会2017年秋期研究発表会

2017 Fiscal Year Annual Research Report

Accurate speech recognition system with deep neural network introducing human auditory characteristic in real environments

Principal Investigator

山本 一公 中部大学, 工学部, 准教授 (40324230)

Research Products

[Presentation] Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates2017

Author(s)

Organizer

[Presentation] Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides2017

Author(s)

Organizer

[Presentation] Detection of overlapping acoustic events based on NMF with shared basis vectors2017

Author(s)

Organizer

[Presentation] 大規模データベースCSJを用いたDNNに基づくフィルタバンク学習の評価2017

Author(s)

Organizer

山本一公中部大学, 工学部, 准教授 (40324230)