Automatic classification of speech and audio signals using large-scale corpus and its application to speech recognition

Research Project

Project/Area Number	25330183
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Yamagata University
Principal Investigator	Kosaka Tetsuo 山形大学, 理工学研究科, 教授 (50359569)
Co-Investigator(Renkei-kenkyūsha)	KATO Masaharu 山形大学, 大学院理工学研究科, 助教 (10250953)
Project Period (FY)	2013-04-01 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000) Fiscal Year 2015: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000) Fiscal Year 2014: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2013: ¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
Keywords	音声認識 / 音響モデル / クラスタリング / 隠れマルコフモデル / ディープニューラルネットワーク / ディープニューラルネット / 話者適応 / 話者 / 音声コーパス
Outline of Final Research Achievements	Nowadays, due to the expansion of speech corpus and advancement of computational performance, performance of speech recognition is improving. However, speech and audio signals are highly variable in terms of their features such as speaker characteristics and background noise. This variability sometimes causes the degradation of recognition performance. In this study, we investigate this problem by using clustering techniques. We attempt to improve recognition performance by using class models trained with categorized data based on acoustic features. The training of models was carried out using the large-scale Japanese speech corpus. In this study, we utilize not only Gaussian mixture models (GMMs) but also deep neural networks (DNNs) as acoustic models.

Report

(4 results)

2015 Annual Research Report Final Research Report ( PDF )
2014 Research-status Report
2013 Research-status Report

Research Products
(18 results)

All 2016 2015 2014 2013 Other

All Journal Article (4 results) (of which Peer Reviewed: 4 results, Open Access: 2 results, Acknowledgement Compliant: 2 results) Presentation (12 results) Book (1 results) Remarks (1 results)

[Journal Article] Deep Neural Network-Based Speech Recognition with Combination of Speaker-Class Models2015
- Author(s)
  Tetsuo Kosaka, Kazuki Konno, Masaharu Kato
- Journal Title
  
  Proc. of APSIPA ASC 2015
  
  Volume: SP2-2.3 Pages: 1-4
- DOI
  10.1109/apsipa.2015.7415464
- Related Report
  2015 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Unsupervised cross-adaptation using language model and deep learning based acoustic model adaptations2014
- Author(s)
  Akira Takagi, Kazuki Konno, Masaharu Kato and Tetsuo Kosaka
- Journal Title
  
  Proc. of APSIPA ASC 2014
  
  Volume: WA-P-16 Pages: 1-4
- DOI
  10.1109/apsipa.2014.7041581
- Related Report
  2014 Research-status Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] 話者クラス音響モデル及び単語グラフ統合を用いた音声認識2013
- Author(s)
  小坂哲夫，伊藤貴，加藤正治，好田正紀
- Journal Title
  
  電子情報通信学会論文誌,
  
  Volume: Vol. J96-D, No. 11 Pages: 2795-2803
- NAID
  110009661670
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Journal Article] Speech recognition with large-scale speaker-class-based acoustic modeling2013
- Author(s)
  Kazuki Konno, Masaharu Kato and Tetsuo Kosaka
- Journal Title
  
  Proc. of APSIPA ASC 2013
  
  Volume: OS.28-SLA..9, 113 Pages: 1-4
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Presentation] ディープニューラルネットワークを用いた映画中の音声区間検出の検討2016
- Author(s)
  菅郁巳, 安原龍, 井上雅史, 小坂哲夫
- Organizer
  日本音響学会春季講演論文集
- Place of Presentation
  桐蔭横浜大学
- Year and Date
  2016-03-09
- Related Report
  2015 Annual Research Report
[Presentation] ディープニューラルネットによる話者クラス音響モデルを用いた音声認識2015
- Author(s)
  今野和樹，加藤正治，小坂哲夫
- Organizer
  日本音響学会秋季講演論文集
- Place of Presentation
  会津大学
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] DNN-HMMを用いた教師なしクロス適応の性能改善の検討2015
- Author(s)
  高木瑛, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会春季講演論文集
- Place of Presentation
  中央大学後楽園キャンパス
- Year and Date
  2015-03-16 – 2015-03-18
- Related Report
  2014 Research-status Report
[Presentation] 最尤推定による話者クラスDNNの出力統合を用いた音声認識2015
- Author(s)
  今野和樹，加藤正治，小坂哲夫
- Organizer
  日本音響学会春季講演論文集
- Place of Presentation
  中央大学後楽園キャンパス
- Year and Date
  2015-03-16 – 2015-03-18
- Related Report
  2014 Research-status Report
[Presentation] DNN-HMMを用いた音声認識におけるパラメータ数の検討2015
- Author(s)
  小野瑞穂, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Year and Date
  2015-03-04
- Related Report
  2014 Research-status Report
[Presentation] Deep Learningによる教師つき適応の結果を用いた日本語講演音声認識の誤り解析2014
- Author(s)
  小野瑞穂，小関翔太，加藤正治，小坂哲夫
- Organizer
  日本音響学会秋季講演論文集
- Place of Presentation
  北海学園大学豊平キャンパス
- Year and Date
  2014-09-03 – 2014-09-05
- Related Report
  2014 Research-status Report
[Presentation] 音声認識におけるDNNを用いた話者クラスモデルの検討2014
- Author(s)
  今野和樹，高木　瑛，加藤正治，小坂哲夫
- Organizer
  電気関係学会東北支部連合大会
- Place of Presentation
  山形大学工学部
- Year and Date
  2014-08-21 – 2014-08-22
- Related Report
  2014 Research-status Report
[Presentation] DNN-HMMを用いた音響モデルおよび言語モデルのクロス適応2014
- Author(s)
  高木瑛, 今野和樹, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会声言語情報処理研究報告
- Place of Presentation
  東京工業大学　大岡山キャンパス
- Year and Date
  2014-05-22 – 2014-05-23
- Related Report
  2014 Research-status Report
[Presentation] DNN-HMMを用いた日本語講演音声認識における話者適応の検討2014
- Author(s)
  小坂哲夫, 今野和樹, 高木瑛, 加藤正治
- Organizer
  日本音響学会春季講演論文集
- Place of Presentation
  日本大学理工学部
- Related Report
  2013 Research-status Report
[Presentation] 大規模話者クラス音響モデルを用いた音声認識の精度向上の検討2013
- Author(s)
  今野和樹, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  豊橋技術科学大学
- Related Report
  2013 Research-status Report
[Presentation] 単語グラフを用いた音声アライメント2013
- Author(s)
  加藤正治, 小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  豊橋技術科学大学
- Related Report
  2013 Research-status Report
[Presentation] 雑音重複区間のモデル化による音声区間検出の性能向上
- Author(s)
  佐々木志貢, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Related Report
  2013 Research-status Report
[Book] 進化するヒトと機械の音声コミュニケーション第1編2章2015
- Author(s)
  小坂哲夫
- Total Pages
  10
- Publisher
  (株)ニッケイ印刷
- Related Report
  2014 Research-status Report
[Remarks] 小坂研究室ホームページ
- URL
  http://speech-lab.yz.yamagata-u.ac.jp/
- Related Report
  2015 Annual Research Report

Automatic classification of speech and audio signals using large-scale corpus and its application to speech recognition

Principal Investigator

Kosaka Tetsuo 山形大学, 理工学研究科, 教授 (50359569)

¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)

Report

Research Products

[Journal Article] Deep Neural Network-Based Speech Recognition with Combination of Speaker-Class Models2015

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Unsupervised cross-adaptation using language model and deep learning based acoustic model adaptations2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] 話者クラス音響モデル及び単語グラフ統合を用いた音声認識2013

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Speech recognition with large-scale speaker-class-based acoustic modeling2013

Author(s)

Journal Title

Related Report

[Presentation] ディープニューラルネットワークを用いた映画中の音声区間検出の検討2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] ディープニューラルネットによる話者クラス音響モデルを用いた音声認識2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] DNN-HMMを用いた教師なしクロス適応の性能改善の検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 最尤推定による話者クラスDNNの出力統合を用いた音声認識2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] DNN-HMMを用いた音声認識におけるパラメータ数の検討2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Deep Learningによる教師つき適応の結果を用いた日本語講演音声認識の誤り解析2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声認識におけるDNNを用いた話者クラスモデルの検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] DNN-HMMを用いた音響モデルおよび言語モデルのクロス適応2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] DNN-HMMを用いた日本語講演音声認識における話者適応の検討2014

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 大規模話者クラス音響モデルを用いた音声認識の精度向上の検討2013

Author(s)