A study on acoustic model adaptation for deep-learning-based speech recognition

Research Project

Project/Area Number	16K00227
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Yamagata University
Principal Investigator	Kosaka Tetsuo 山形大学, 大学院理工学研究科, 教授 (50359569)
Research Collaborator	KATO Masaharu
Project Period (FY)	2016-04-01 – 2019-03-31
Project Status	Completed (Fiscal Year 2018)
Budget Amount *help	¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000) Fiscal Year 2018: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2017: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2016: ¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
Keywords	音声認識 / 音響モデル / ディープニューラルネットワーク / 適応技術 / 話し言葉 / 感情音声 / 音声区間検出 / ディープラーニング / 感情音声認識 / ニューラルネットワーク / 話者適応
Outline of Final Research Achievements	Although the deep-learning-based speech recognition technology has made great achievements in recent years, the spontaneous-speech-recognition technology has not yet obtained sufficient results. As major factors of performance degradation in speech recognition, a variety of speaker characteristics, acoustic environments, and speaking styles can be mentioned. To solve these problems, I developed techniques centered around acoustic-model adaptation to improve the speech-recognition performance. Consequently, performance improvement was achieved with regard to spontaneous and emotional speech. Additionally, the performance of voice-activity detection was also improved.
Academic Significance and Societal Importance of the Research Achievements	本研究により，1)話し言葉音声認識における適応精度の向上，2)雑音下音声区間検出の精度向上，3)感情音声認識の性能向上を達成した．1)は話し言葉音声認識に限らず，異なる分野においても応用可能な適応手法で汎用性の高い技術である．2)の成果を利用してマルチモーダル対話コーパスが整備されており，当該分野の研究者にとって有益と考えられる．また3)についてもロボットと人間との会話など様々な分野に利用が可能である．以上，本研究で開発した技術は波及効果が高く，学術的，社会的意義が高いと考えられる．

Report

(4 results)

2018 Annual Research Report Final Research Report ( PDF )
2017 Research-status Report
2016 Research-status Report

Research Products
(24 results)

All 2019 2018 2017 2016 Other

All Journal Article (6 results) (of which Peer Reviewed: 6 results, Open Access: 6 results) Presentation (13 results) (of which Int'l Joint Research: 1 results) Remarks (5 results)

[Journal Article] Unsupervised Cross Adaptation Using Deep Neural Networks in Speech Recognition Systems2018
- Author(s)
  冨田健斗、高木瑛、加藤正治、小坂哲夫
- Journal Title
  
  電子情報通信学会論文誌D 情報・システム
  
  Volume: J101-D Issue: 8 Pages: 1190-1199
- DOI
  10.14923/transinfj.2017JDP7076
- ISSN
  1880-4535, 1881-0225
- Year and Date
  2018-08-01
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus2018
- Author(s)
  Kosaka Tetsuo、Aizawa Yoshitaka、Kato Masaharu、Nose Takashi
- Journal Title
  
  Proc. of APSIPA ASC 2018
  
  Volume: - Pages: 1747-1751
- DOI
  10.23919/apsipa.2018.8659756
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Improving Voice Activity Detection for Multimodal Movie Dialogue Corpus2018
- Author(s)
  Kosaka Tetsuo、Suga Ikumi、Inoue Masashi
- Journal Title
  
  2018 IEEE 7th Global Conference on Consumer Electronics (GCCE)
  
  Volume: - Pages: 481-484
- DOI
  10.1109/gcce.2018.8574730
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Large-scale multimodal movie dialogue corpus2016
- Author(s)
  Ryu Yasuhara, Masashi Inoue, Ikumi Suga and Tetsuo Kosaka
- Journal Title
  
  Proc. of the 18th ACM International Conference on Multimodal Interaction
  
  Volume: - Pages: 414-415
- DOI
  10.1145/2993148.2998523
- Related Report
  2016 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Many-to-many voice conversion using hidden Markov model-based speech recognition and synthesis2016
- Author(s)
  Y. Aizawa, M. Kato and T. Kosaka
- Journal Title
  
  The Journal of the Acoustical Society of America
  
  Volume: 140 Issue: 4_Supplement Pages: 2964-2964
- DOI
  10.1121/1.4969167
- Related Report
  2016 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Voice activity detection in movies using multi-class deep neural networks2016
- Author(s)
  I. Suga, R. Yasuhara, M. Inoue and T. Kosaka
- Journal Title
  
  The Journal of the Acoustical Society of America
  
  Volume: 140 Issue: 4_Supplement Pages: 3116-3116
- DOI
  10.1121/1.4969758
- Related Report
  2016 Research-status Report
- Peer Reviewed / Open Access
[Presentation] 日本語感情音声コーパスJTESを対象とした感情認識の基礎検討2019
- Author(s)
  羽田優花，加藤正治，小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Related Report
  2018 Annual Research Report
[Presentation] 言語モデルの改良による感情音声の認識と韻律制御声質変換の性能向上2019
- Author(s)
  佐伯和哉，加藤正治，小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Related Report
  2018 Annual Research Report
[Presentation] 感情音声認識における音響モデル適応と声質変換への応用2018
- Author(s)
  小坂哲夫，相澤佳孝，加藤正治，能勢隆
- Organizer
  日本音響学会秋季講演論文集
- Related Report
  2018 Annual Research Report
[Presentation] DNNを用いた教師なしクロス適応の性能評価2018
- Author(s)
  冨田建斗，加藤正治，小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Related Report
  2017 Research-status Report
[Presentation] 自発対話音声を用いた感情認識の学習データによる検討2018
- Author(s)
  真壁大介，加藤正治，小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Related Report
  2017 Research-status Report
[Presentation] 映画からのマルチモーダル対話コーパスの作成2017
- Author(s)
  井上雅史，安原龍，菅郁巳，小坂哲夫
- Organizer
  人工知能学会全国大会
- Related Report
  2017 Research-status Report
[Presentation] 感情音声データベースJTESを用いた感情音声認識におけるDNN-HMM音響モデル適応の検討2017
- Author(s)
  相澤佳孝，小坂哲夫，加藤正治，能勢隆
- Organizer
  日本音響学会秋季講演論文集
- Related Report
  2017 Research-status Report
[Presentation] DNNを用いた映画の音声区間検出におけるクラス分類の検討2017
- Author(s)
  菅郁巳，小坂哲夫，井上雅史
- Organizer
  日本音響学会秋季講演論文集
- Related Report
  2017 Research-status Report
[Presentation] 感情音声データベースJTESを用いた感情音声認識におけるモデル適応の性能向上の検討2017
- Author(s)
  相澤佳孝，小坂哲夫，加藤正治，能勢隆
- Organizer
  情報処理学会研究報告
- Related Report
  2017 Research-status Report
[Presentation] DNNによる音声認識を用いた感情音声の声質変換の検討2017
- Author(s)
  笹田拓臣，相澤佳孝, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学
- Related Report
  2016 Research-status Report
[Presentation] 高精度な初期モデルを用いた教師なしクロス適応の評価2016
- Author(s)
  冨田健斗, 高木瑛, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会秋季講演論文集
- Place of Presentation
  富山大学
- Year and Date
  2016-09-14
- Related Report
  2016 Research-status Report
[Presentation] HMM認識・合成による感情音声の声質変換の性能向上2016
- Author(s)
  相澤佳孝, 中川由暁, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会秋季講演論文集
- Place of Presentation
  富山大学
- Year and Date
  2016-09-14
- Related Report
  2016 Research-status Report
[Presentation] Voice Conversion of emotional speech using hidden Markov model-based speech recognition and synthesis2016
- Author(s)
  Tetsuo Kosaka, Yoshiaki Nakagawa and Masaharu Kato
- Organizer
  Proc. of 22nd International Congress on Acoustics
- Place of Presentation
  Buenos Aires, Argentina
- Year and Date
  2016-09-05
- Related Report
  2016 Research-status Report
- Int'l Joint Research
[Remarks] 小坂研究室
- URL
  https://speech-lab.yz.yamagata-u.ac.jp/
- Related Report
  2018 Annual Research Report
[Remarks] Movie Dialogue Corpus
- URL
  http://www.ice.tohtech.ac.jp/~inoue/moviedialcorpus/index.html
- Related Report
  2018 Annual Research Report
[Remarks] 小坂研究室
- URL
  http://speech-lab.yz.yamagata-u.ac.jp/
- Related Report
  2017 Research-status Report
[Remarks] 小坂研究室
- URL
  http://speech-lab.yz.yamagata-u.ac.jp/index.html
- Related Report
  2016 Research-status Report
[Remarks] Ｍｏｖｉｅ　Ｄｉａｌｏｇｕｅ　Ｃｏｒｐｕｓ
- URL
  http://i.yz.yamagata-u.ac.jp/moviedialcorpus/
- Related Report
  2016 Research-status Report

A study on acoustic model adaptation for deep-learning-based speech recognition

Principal Investigator

Kosaka Tetsuo 山形大学, 大学院理工学研究科, 教授 (50359569)

¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)

Report

Research Products

[Journal Article] Unsupervised Cross Adaptation Using Deep Neural Networks in Speech Recognition Systems2018

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Improving Voice Activity Detection for Multimodal Movie Dialogue Corpus2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Large-scale multimodal movie dialogue corpus2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Many-to-many voice conversion using hidden Markov model-based speech recognition and synthesis2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Voice activity detection in movies using multi-class deep neural networks2016

Author(s)

Journal Title

DOI

Related Report

[Presentation] 日本語感情音声コーパスJTESを対象とした感情認識の基礎検討2019

Author(s)

Organizer

Related Report

[Presentation] 言語モデルの改良による感情音声の認識と韻律制御声質変換の性能向上2019

Author(s)

Organizer

Related Report

[Presentation] 感情音声認識における音響モデル適応と声質変換への応用2018

Author(s)

Organizer

Related Report

[Presentation] DNNを用いた教師なしクロス適応の性能評価2018

Author(s)

Organizer

Related Report

[Presentation] 自発対話音声を用いた感情認識の学習データによる検討2018

Author(s)

Organizer

Related Report

[Presentation] 映画からのマルチモーダル対話コーパスの作成2017

Author(s)

Organizer

Related Report

[Presentation] 感情音声データベースJTESを用いた感情音声認識におけるDNN-HMM音響モデル適応の検討2017

Author(s)

Organizer

Related Report

[Presentation] DNNを用いた映画の音声区間検出におけるクラス分類の検討2017

Author(s)

Organizer

Related Report

[Presentation] 感情音声データベースJTESを用いた感情音声認識におけるモデル適応の性能向上の検討2017

Author(s)

Organizer

Related Report

[Presentation] DNNによる音声認識を用いた感情音声の声質変換の検討2017

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 高精度な初期モデルを用いた教師なしクロス適応の評価2016

[Remarks] Ｍｏｖｉｅ　Ｄｉａｌｏｇｕｅ　Ｃｏｒｐｕｓ