ディープニューラルネットワークを用いる高効率適応学習の汎用的フレームワークの提案

Research Project

Project/Area Number	15J02418
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Perceptual information processing
Research Institution	Doshisha University
Principal Investigator	落合翼同志社大学, 理工学研究科, 特別研究員(DC1)
Project Period (FY)	2015-04-24 – 2018-03-31
Project Status	Completed (Fiscal Year 2017)
Budget Amount *help	¥2,800,000 (Direct Cost: ¥2,800,000) Fiscal Year 2017: ¥900,000 (Direct Cost: ¥900,000) Fiscal Year 2016: ¥900,000 (Direct Cost: ¥900,000) Fiscal Year 2015: ¥1,000,000 (Direct Cost: ¥1,000,000)
Keywords	多チャンネルend-to-end音声認識 / end-to-endモデルの話者・環境適応 / 環境適応タスクにおける評価実験 / モデル適応のオンライン化 / ネットワーク構造の自動最適化 / 線形変換ネットワークの導入 / 行列ランクに基づいた分析 / ボトルネック構造の導入
Outline of Annual Research Achievements	本研究課題は、ディープニューラルネットワーク(DNN)を用いた、対象問題を限定しない一般的な適応学習フレームワークの構築を目指すものである。本年度は、本研究課題のコンセプトである「DNNの内部に対する機能の集約・局在化」のアイデアを応用し、DNN自身に環境に対する適応能力を学習によって自動的に獲得させる方法論の提案を行った。提案手法では、本来ブラックボックスとして扱われるDNNの内部構造に対し、対象問題に対する事前知識（例えば、雑音下音声認識では雑音抑圧の機能が必要）を数式としてネットワークに組み込むことで、DNNの学習の方向性を誘導し、対象問題にとって望ましい機能をDNNの内部に獲得させることに成功した。本年度の研究成果として、大きく分けて以下の2つが挙げられる。（1）雑音抑圧機能を自動獲得するend-to-end音声認識モデル構造の提案。音声認識に関わる一連の手続きを、単一のDNNに基づいて構築するend-to-end音声認識モデルに対し、多チャンネル信号処理技術を数式としてネットワークの内部に組み込むことで、雑音抑圧機能を学習によって自動的に獲得するネットワーク構造の提案を行った。評価実験の結果、提案したend-to-end音声認識モデルは、雑音環境に対する高い適応能力を獲得し、雑音下音声認識において従来手法よりも高い認識性能を発揮することが確認された。（2）提案したend-to-end音声認識モデルに対するモデル適応技術の有効性の検証。本年度に提案したend-to-end音声認識モデルに対し、昨年度までに研究していたモデル適応技術を組み合わせることで、更なる認識性能の向上を得ることが出来ないか、評価実験を通してその有効性の検証を行った。評価実験の結果、提案手法を組み合わせることで、end-to-end音声認識モデルは更なる認識性能の向上を獲得することが確認された。
Research Progress Status	29年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	29年度が最終年度であるため、記入しない。

Report

(3 results)

Research Products
(15 results)

All 2018 2017 2016 2015 Other

All Int'l Joint Research (1 results) Journal Article (2 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 2 results, Acknowledgement Compliant: 1 results) Presentation (12 results) (of which Int'l Joint Research: 7 results, Invited: 1 results)

[Int'l Joint Research] MERL(米国)
- Related Report
  2016 Annual Research Report
[Journal Article] Unified Architecture for Multichannel End-to-end Speech Recognition with Neural Beamforming2017
- Author(s)
  Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey, Xiong Xiao
- Journal Title
  
  IEEE Journal of Selected Topics in Signal Processing (JSTSP)
  
  Volume: volume 11, issue 8 Pages: 1274-1288
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers2016
- Author(s)
  Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Xugang Lu, Chiori Hori, Hisashi Kawai, Shigeru Katagiri
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E99.D Issue: 10 Pages: 2431-2443
- DOI
  10.1587/transinf.2016SLP0010
- NAID
  130005598241
- ISSN
  0916-8532, 1745-1361
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Presentation] Speaker Adaptation for Multichannel End-to-end Speech Recognition2018
- Author(s)
  Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, John R. Hershey
- Organizer
  International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Automatic node selection for deep neural networks using group lasso regularization2017
- Author(s)
  Tsubasa Ochiai
- Organizer
  International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Place of Presentation
  New Orleans (USA)
- Year and Date
  2017-03-08
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models2017
- Author(s)
  Tsubasa Ochiai
- Organizer
  International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Place of Presentation
  New Orleans (USA)
- Year and Date
  2017-03-06
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Multichannel end-to-end speech recognition2017
- Author(s)
  Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey
- Organizer
  International Conference on Machine Learning (ICML)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR2017
- Author(s)
  Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri
- Organizer
  IEEE International Workshop on Machine Learning for Signal Processing (MLSP)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Deep Learning 基礎と応用（音声認識分野を中心に）2016
- Author(s)
  落合翼
- Organizer
  音響学会
- Place of Presentation
  京都府
- Year and Date
  2016-03-28
- Related Report
  2015 Annual Research Report
- Invited
[Presentation] Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer2016
- Author(s)
  落合翼
- Organizer
  International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Place of Presentation
  Shanghai
- Year and Date
  2016-03-23
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] 線形変換ネットワークを用いて話者正規化学習されたDNNのためのボトルネック話者適応法の提案2016
- Author(s)
  落合翼
- Organizer
  音響学会
- Place of Presentation
  神奈川県
- Year and Date
  2016-03-11
- Related Report
  2015 Annual Research Report
[Presentation] 話者正規化学習されたDNNにおける行列のランクに基づく動作分析2015
- Author(s)
  落合翼
- Organizer
  音響学会
- Place of Presentation
  福島県
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] 国際会議ICASSP2015参加報告2015
- Author(s)
  落合翼，他複数
- Organizer
  情報処理学会
- Place of Presentation
  長野県
- Year and Date
  2015-07-16
- Related Report
  2015 Annual Research Report
[Presentation] 線形変換ネットワークを用いて話者正規化学習されたDNNにおけるネットワークサイズが与える影響の実験的評価2015
- Author(s)
  落合翼
- Organizer
  電子情報通信学会
- Place of Presentation
  長野県
- Year and Date
  2015-07-16
- Related Report
  2015 Annual Research Report
[Presentation] Speaker adaptive training using deep neural networks embedding linear transformation networks2015
- Author(s)
  落合翼
- Organizer
  International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Place of Presentation
  Brisbane
- Year and Date
  2015-04-21
- Related Report
  2015 Annual Research Report
- Int'l Joint Research

ディープニューラルネットワークを用いる高効率適応学習の汎用的フレームワークの提案

Principal Investigator

落合 翼 同志社大学, 理工学研究科, 特別研究員(DC1)

¥2,800,000 (Direct Cost: ¥2,800,000)

Report

Research Products

[Int'l Joint Research] MERL(米国)

Related Report

[Journal Article] Unified Architecture for Multichannel End-to-end Speech Recognition with Neural Beamforming2017

Author(s)

Journal Title

Related Report

[Journal Article] Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers2016

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] Speaker Adaptation for Multichannel End-to-end Speech Recognition2018

Author(s)

Organizer

Related Report

[Presentation] Automatic node selection for deep neural networks using group lasso regularization2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Multichannel end-to-end speech recognition2017

Author(s)

Organizer

Related Report

[Presentation] Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR2017

Author(s)

Organizer

Related Report

[Presentation] Deep Learning 基礎と応用（音声認識分野を中心に）2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 線形変換ネットワークを用いて話者正規化学習されたDNNのためのボトルネック話者適応法の提案2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 話者正規化学習されたDNNにおける行列のランクに基づく動作分析2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 国際会議ICASSP2015参加報告2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 線形変換ネットワークを用いて話者正規化学習されたDNNにおけるネットワークサイズが与える影響の実験的評価2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Speaker adaptive training using deep neural networks embedding linear transformation networks2015

落合翼同志社大学, 理工学研究科, 特別研究員(DC1)