Accurate speech recognition system with deep neural network introducing human auditory characteristic in real environments

Research Project

Project/Area Number	15K00233
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Chubu University (2017) Toyohashi University of Technology (2015-2016)
Principal Investigator	YAMAMOTO Kazumasa 中部大学, 工学部, 准教授 (40324230)
Co-Investigator(Kenkyū-buntansha)	中川聖一豊橋技術科学大学, リーディング大学院教育推進機構, 特命教授 (20115893)
Project Period (FY)	2015-04-01 – 2018-03-31
Project Status	Completed (Fiscal Year 2017)
Budget Amount *help	¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000) Fiscal Year 2017: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2016: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2015: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Keywords	音声認識 / 深層学習 / Deep Neural Network / 聴覚特性 / 音響特徴量 / フィルタバンク / 話者適応
Outline of Final Research Achievements	Currently, deep learning has been introduced into speech recognition technology and the speech recognition technology is gradually being used practically, but speech recognition performance is still not sufficient in noisy environments or for distant-talking. The purpose of this research is to improve speech recognition accuracy by combining DNN (Deep Neural Network) acoustic model with human auditory characteristics. In this research, we proposed a method to automatically learn feature extraction filterbanks at the bottom of DNN acoustic model by using deep learning considering human auditory characteristics. By using this method, improvement of speech recognition accuracy was obtained for speaker-independent speech recognition. In addition, the proposed method improved speaker-adapted speech recognition accuracy even under the condition that the amount of adaptation data is small. The results showed the effectiveness of the proposed method.

Report

(4 results)

2017 Annual Research Report Final Research Report ( PDF )
2016 Research-status Report
2015 Research-status Report

Research Products
(22 results)

All 2017 2016 2015

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 1 results) Presentation (20 results) (of which Int'l Joint Research: 11 results)

[Journal Article] Speech Recognition of Short Time Utterance Based on Speaker Clustering2017
- Author(s)
  関博史、榎並大介、朱発強、山本一公、中川聖一
- Journal Title
  
  電子情報通信学会論文誌D 情報・システム
  
  Volume: J100-D Issue: 1 Pages: 81-92
- DOI
  10.14923/transinfj.2016JDP7063
- ISSN
  1880-4535, 1881-0225
- Year and Date
  2017-01-01
- Related Report
  2016 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Chat-like Spoken Dialogue System with Multiple Agents2016
- Author(s)
  藤堂祐樹, 西村良太, 山本一公, 中川聖一
- Journal Title
  
  電子情報通信学会論文誌D 情報・システム
  
  Volume: J99-D Issue: 2 Pages: 188-200
- DOI
  10.14923/transinfj.2015JDP7010
- ISSN
  1880-4535, 1881-0225
- Year and Date
  2016-02-01
- Related Report
  2015 Research-status Report
- Peer Reviewed
[Presentation] DNNに基づくフィルタバンクの再学習による話者クラス適応の検討2017
- Author(s)
  関博史, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15
- Related Report
  2016 Research-status Report
[Presentation] 音声感情のコンテキスト情報を考慮したラベリングと認識手法の検討2017
- Author(s)
  竹部真晃, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15
- Related Report
  2016 Research-status Report
[Presentation] ドメイン間遷移を持つ雑談音声対話システムの検討2017
- Author(s)
  芝原優真, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15
- Related Report
  2016 Research-status Report
[Presentation] 講義スライド中の文章・図表を対象とする説明箇所自動推定手法の検討2017
- Author(s)
  辻村祥子, 山本一公, 中川聖一
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス
- Year and Date
  2017-03-15
- Related Report
  2016 Research-status Report
[Presentation] A deep neural network integrated with filterbank learning for speech recognition2017
- Author(s)
  Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
- Place of Presentation
  New Orleans, Louisiana, USA
- Year and Date
  2017-03-05
- Related Report
  2016 Research-status Report
- Int'l Joint Research
[Presentation] Lyric recognition in monophonic singing using pitch-dependent DNN2017
- Author(s)
  Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
- Place of Presentation
  New Orleans, Louisiana, USA
- Year and Date
  2017-03-05
- Related Report
  2016 Research-status Report
- Int'l Joint Research
[Presentation] Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates2017
- Author(s)
  Sahashi Koya, Goto Norioki, Seki Hiroshi, Yamamoto Kazumasa, Akiba Tomoyoshi, Nakagawa Seiichi
- Organizer
  4th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA 2017)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides2017
- Author(s)
  Tsujimura Shoko, Yamamoto Kazumasa, Nakagawa Seiichi
- Organizer
  INTERSPEECH 2017
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Detection of overlapping acoustic events based on NMF with shared basis vectors2017
- Author(s)
  Yamamoto Kazumasa, Ishikawa Chikara, Sahashi Koya, Nakagawa Seiichi
- Organizer
  IEEE 6th Global Conference on Consumer Electronics (GCCE 2017)
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] 大規模データベースCSJを用いたDNNに基づくフィルタバンク学習の評価2017
- Author(s)
  関博史、山本一公、秋葉友良、中川聖一
- Organizer
  日本音響学会2017年秋期研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] Investigation of glottal features and annotation procedure for speech emotion recognition2016
- Author(s)
  Masashi Takebe, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2016)
- Place of Presentation
  Jeju, Korea
- Year and Date
  2016-12-13
- Related Report
  2016 Research-status Report
- Int'l Joint Research
[Presentation] 音声認識のためのDNNに基づくフィルタバンクの学習の検討2016
- Author(s)
  関博史, 山本一公, 中川聖一
- Organizer
  日本音響学会2016年秋季研究発表会
- Place of Presentation
  富山大学五福キャンパス
- Year and Date
  2016-09-14
- Related Report
  2016 Research-status Report
[Presentation] Effect of sympathetic relation and unsympathetic relation in multi-agent spoken dialogue system2016
- Author(s)
  Yuma Shibahara, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  International Conference on Advanced Infomatics: Concepts, Theory and Applications (ICAICTA 2016)
- Place of Presentation
  Jeju, Korea
- Year and Date
  2016-08-17
- Related Report
  2016 Research-status Report
- Int'l Joint Research
[Presentation] Speech analysis of sung-speech and lyric recognition in monophonic singing2016
- Author(s)
  Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing
- Place of Presentation
  Shanghai, China
- Year and Date
  2016-03-20
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] 畳み込みニューラルネットワークの教師なし逐次適応学習の検討2016
- Author(s)
  関博史，山本一公，中川聖一
- Organizer
  日本音響学会
- Place of Presentation
  桐蔭横浜大学
- Year and Date
  2016-03-09
- Related Report
  2015 Research-status Report
[Presentation] NMFによる任意の音楽重畳音声の認識2016
- Author(s)
  橋本尚亮，山本一公，中川聖一
- Organizer
  日本音響学会
- Place of Presentation
  桐蔭横浜大学
- Year and Date
  2016-03-09
- Related Report
  2015 Research-status Report
[Presentation] 歌声音声の特徴分析とピッチ特徴量を考慮した歌詞認識の検討2016
- Author(s)
  川井大陸，山本一公，中川聖一
- Organizer
  日本音響学会
- Place of Presentation
  桐蔭横浜大学
- Year and Date
  2016-03-09
- Related Report
  2015 Research-status Report
[Presentation] Speech recognition based on Itakura-Saito divergence and dynamics / sparseness constraints from mixed sound of speech and music by non-negative matrix factorization2015
- Author(s)
  Naoaki Hashimoto, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
- Place of Presentation
  Hong Kong
- Year and Date
  2015-12-16
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] Deep neural network based acoustic model using speaker-class information for short time utterance2015
- Author(s)
  Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
- Place of Presentation
  Hong Kong
- Year and Date
  2015-12-16
- Related Report
  2015 Research-status Report
- Int'l Joint Research
[Presentation] Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction2015
- Author(s)
  Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  INTERSPEECH
- Place of Presentation
  Dresden, Germany
- Year and Date
  2015-09-06
- Related Report
  2015 Research-status Report
- Int'l Joint Research

Accurate speech recognition system with deep neural network introducing human auditory characteristic in real environments

Principal Investigator

YAMAMOTO Kazumasa 中部大学, 工学部, 准教授 (40324230)

¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)

Report

Research Products

[Journal Article] Speech Recognition of Short Time Utterance Based on Speaker Clustering2017

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Chat-like Spoken Dialogue System with Multiple Agents2016

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Presentation] DNNに基づくフィルタバンクの再学習による話者クラス適応の検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声感情のコンテキスト情報を考慮したラベリングと認識手法の検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] ドメイン間遷移を持つ雑談音声対話システムの検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 講義スライド中の文章・図表を対象とする説明箇所自動推定手法の検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] A deep neural network integrated with filterbank learning for speech recognition2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Lyric recognition in monophonic singing using pitch-dependent DNN2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Robust lecture speech translation for speech misrecognition and its rescoring effect from multiple candidates2017

Author(s)

Organizer

Related Report

[Presentation] Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides2017

Author(s)

Organizer

Related Report

[Presentation] Detection of overlapping acoustic events based on NMF with shared basis vectors2017

Author(s)

Organizer

Related Report

[Presentation] 大規模データベースCSJを用いたDNNに基づくフィルタバンク学習の評価2017

Author(s)

Organizer

Related Report

[Presentation] Investigation of glottal features and annotation procedure for speech emotion recognition2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声認識のためのDNNに基づくフィルタバンクの学習の検討2016

Author(s)