2016 Fiscal Year Research-status Report

人間の聴覚特性を導入した深層ニューラルネットワークによる高精度な実環境下音声認識

Research Project

Project/Area Number	15K00233
Research Institution	Toyohashi University of Technology
Principal Investigator	山本一公豊橋技術科学大学, 工学研究科, 准教授 (40324230)
Co-Investigator(Kenkyū-buntansha)	中川聖一豊橋技術科学大学, リーディング大学院教育推進機構, 特任教授 (20115893)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	音声認識 / 深層学習 / Deep Neural Network / 聴覚特性 / 音響特徴量 / フィルタバンク
Outline of Annual Research Achievements	現在、音声認識技術において、深層学習（Deep Learning）を用いた音響モデルであるDNN（Deep Neural Network）音響モデルが一般化しつつあり、実用的な音声認識精度が得られるようになってきている。しかしながら、雑音環境下や遠隔発話条件での音声認識性能は未だ十分ではない。本研究の目的は、DNN音響モデル（特に特徴抽出の部分）に人間の聴覚特性を融合させることで、雑音環境下等での音声認識精度改善を得ることである。本年度は、人間の周波数分解能に関する聴覚特性（聴覚フィルタバンク）を自動的に学習することで音声認識性能を向上させる研究を行った。自動的にフィルタバンクを学習するためには、フィルタバンクがパラメトリックモデル化されている必要がある。そのため、従来用いられてきたノンパラメトリックな三角フィルタの替わりにガウス関数を用いたフィルタバンクをニューラルネットワークに組み込み、そのパラメータをニューラルネットワークのパラメータとして学習するように、学習アルゴリズムおよび学習プログラムを開発した。当初は音声認識性能の改善は見られるものの、学習されるのはフィルタバンクのゲインのみで、フィルタの中心周波数が学習されないという現象が見られたが、学習アルゴリズムを変更（具体的には、SGDからAdamに変更）したことによって、パラメータ全体が学習されるようになった。学習したDNN音響モデルを用いて音声認識実験を行った結果、従来の三角フィルタを用いたフィルタバンクで特徴抽出を行った場合の音声認識精度を上回る特徴抽出フィルタを自動的に学習できるようになった。提案モデルは、フィルタ部分のパラメータが少ないことから、話者適応を容易に行うことができるという期待があった。そのため、ネットワークのパラメータを固定し、フィルタ部分のみを学習することで話者適応化を試みたが、十分な音声認識精度の改善を得ることはできず、課題が残った。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 本年度は、当初、人間の聴覚の時間特性をニューラルネットワークに組み込むことで音声認識性能を向上させる研究を行う予定であったが、ディスカッションの結果研究計画を変更し、人間の周波数分解能に関する聴覚特性（聴覚フィルタバンク）を自動的に学習することで音声認識性能を向上させる研究を行った。概要で述べた通り、本年度の研究の結果、DNN音響モデルを学習した場合に、従来の三角フィルタを用いたフィルタバンクで特徴抽出を行った場合の音声認識精度を上回る特徴抽出フィルタバンクを自動的に学習できるようになった。話者適応化の効果が期待ほど大きくなかったという面は課題として残っているが、研究は概ね順調に進展していると判断する。
Strategy for Future Research Activity	今年度は予定を変更したため、最終年度に人間の聴覚の時間特性をニューラルネットワークに組み込むことで音声認識精度の向上させる研究を行う。人間の聴覚では、入力音声によって基底膜が振動することで基底膜上の有毛細胞で神経発火が起き、それが聴覚神経を伝わって脳内に送られることが分かっている。基底膜の運動は連続であるが、現在の音声認識技術では、音声信号を短時間フレームに切り出して抽出した離散時間特徴量を用いているため、時間連続性が途切れている。人間の聴覚は変化に対して敏感であるため、音声知覚では音素のオンセット（起ち上がり部分）が重要であると言われているが、現在の音響特徴量ではオンセットを扱うための時間分解能が十分でないと考えられる。そこで、今後の研究では、時間分解能を向上した音響特徴量を用いることで、音声認識精度の向上を図っていく。また、現在は雑音環境下への適用が十分ではないため、雑音環境下での音声認識実験も平行して行っていくと共に、話者適応化の可能性も探っていく。
Causes of Carryover	他の研究予算で物品費の一部を賄うことができたことと、次年度における研究代表者の所属機関変更が見込まれたため、物品費の使用を控えた。それにより残額が発生した。
Expenditure Plan for Carryover Budget	実際に所属機関の変更があったため、研究環境整備費用が必要になる。そのため、次年度使用額については主に物品費として使用する。

Research Products
(10 results)

All 2017 2016

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (9 results) (of which Int'l Joint Research: 4 results)

[Journal Article] 話者クラスタリングに基づく短時間発話音声認識2017
- Author(s)
  関博史, 榎並大介, 朱発強, 山本一公, 中川聖一
- Journal Title
  
  電子情報通信学会論文誌
  
  Volume: J100-D Pages: 81-92
- DOI
  10.14923/transinfj.2016JDP7063
- Peer Reviewed
[Presentation] DNNに基づくフィルタバンクの再学習による話者クラス適応の検討2017
- Author(s)
  関博史, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 音声感情のコンテキスト情報を考慮したラベリングと認識手法の検討2017
- Author(s)
  竹部真晃, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] ドメイン間遷移を持つ雑談音声対話システムの検討2017
- Author(s)
  芝原優真, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 講義スライド中の文章・図表を対象とする説明箇所自動推定手法の検討2017
- Author(s)
  辻村祥子, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] A deep neural network integrated with filterbank learning for speech recognition2017
- Author(s)
  Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
- Place of Presentation
  New Orleans, Louisiana, USA
- Year and Date
  2017-03-05 – 2017-03-09
- Int'l Joint Research
[Presentation] Lyric recognition in monophonic singing using pitch-dependent DNN2017
- Author(s)
  Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
- Place of Presentation
  New Orleans, Louisiana, USA
- Year and Date
  2017-03-05 – 2017-03-09
- Int'l Joint Research
[Presentation] Investigation of glottal features and annotation procedure for speech emotion recognition2016
- Author(s)
  Masashi Takebe, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2016)
- Place of Presentation
  Jeju, Korea
- Year and Date
  2016-12-13 – 2016-12-16
- Int'l Joint Research
[Presentation] 音声認識のためのDNNに基づくフィルタバンクの学習の検討2016
- Author(s)
  関博史, 山本一公, 中川聖一
- Organizer
  日本音響学会2016年秋季研究発表会
- Place of Presentation
  富山大学五福キャンパス
- Year and Date
  2016-09-14 – 2016-09-16
[Presentation] Effect of sympathetic relation and unsympathetic relation in multi-agent spoken dialogue system2016
- Author(s)
  Yuma Shibahara, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  International Conference on Advanced Infomatics: Concepts, Theory and Applications (ICAICTA 2016)
- Place of Presentation
  Jeju, Korea
- Year and Date
  2016-08-17 – 2016-08-18
- Int'l Joint Research

2016 Fiscal Year Research-status Report

人間の聴覚特性を導入した深層ニューラルネットワークによる高精度な実環境下音声認識

Principal Investigator

山本 一公 豊橋技術科学大学, 工学研究科, 准教授 (40324230)

Current Status of Research Progress

Reason

Research Products

[Journal Article] 話者クラスタリングに基づく短時間発話音声認識2017

Author(s)

Journal Title

DOI

[Presentation] DNNに基づくフィルタバンクの再学習による話者クラス適応の検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 音声感情のコンテキスト情報を考慮したラベリングと認識手法の検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] ドメイン間遷移を持つ雑談音声対話システムの検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 講義スライド中の文章・図表を対象とする説明箇所自動推定手法の検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A deep neural network integrated with filterbank learning for speech recognition2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Lyric recognition in monophonic singing using pitch-dependent DNN2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Investigation of glottal features and annotation procedure for speech emotion recognition2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 音声認識のためのDNNに基づくフィルタバンクの学習の検討2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Effect of sympathetic relation and unsympathetic relation in multi-agent spoken dialogue system2016

Author(s)

Organizer

Place of Presentation

Year and Date

山本一公豊橋技術科学大学, 工学研究科, 准教授 (40324230)