Speech information processing using deep generative models and their factorization

Research Project

Project/Area Number	25280058
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Partial Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Tokyo Institute of Technology
Principal Investigator	Shinoda Koichi 東京工業大学, 情報理工学(系)研究科, 教授 (10343097)
Co-Investigator(Kenkyū-buntansha)	IWANO Koji 東京都市大学, メディア学部, 教授 (90323823) SHINOZAKI Takahiro 東京工業大学, 大学院総合理工学研究科, 准教授 (80447903)
Project Period (FY)	2013-04-01 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥16,900,000 (Direct Cost: ¥13,000,000、Indirect Cost: ¥3,900,000) Fiscal Year 2015: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2014: ¥6,500,000 (Direct Cost: ¥5,000,000、Indirect Cost: ¥1,500,000) Fiscal Year 2013: ¥6,240,000 (Direct Cost: ¥4,800,000、Indirect Cost: ¥1,440,000)
Keywords	音声情報処理 / 深層学習 / 話者適応 / マルチモーダル処理
Outline of Final Research Achievements	In speech recognition, it is important to train an accurate deep neural network (DNN) acoustic model from a large amount speech data from many speakers. In this study, we developed a framework to improve accuracy of the DNN acoustic model by factorizing speech data into phoneme and speaker elements. First we developed a speaker recognition method using deep Siamese network in which two DNNs which share its part. Second, we applied a DNN with a hierarchical phonetic structure to speaker adaptation. Third, we developed a speaker-adaptive training method where we utilized a student-teacher learning framework using soft targets. We improved speaker verification and speech recognition performance. We also studied DNN implementation and DNN structure design.

Report

(4 results)

2015 Annual Research Report Final Research Report ( PDF )
2014 Annual Research Report
2013 Annual Research Report

Research Products
(12 results)

All 2016 2015 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (11 results) (of which Invited: 5 results)

[Journal Article] Wise Teachers Train Better DNN Acoustic Models2016
- Author(s)
  R. Price, K. Iso, K. Shinoda
- Journal Title
  
  EURASIP Journal on Audio Speech and Music Processing
  
  Volume: 2016 Issue: 1 Pages: 1-19
- DOI
  10.1186/s13636-016-0088-7
- NAID
  120006582513
- Related Report
  2015 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] 音声・画像・映像におけるDeep Learningを用いたパターン認識2015
- Author(s)
  篠田浩一
- Organizer
  人工知能学会AIチャレンジ研究会
- Place of Presentation
  慶応大学
- Year and Date
  2015-11-12
- Related Report
  2015 Annual Research Report
- Invited
[Presentation] A DNN-Based ASR System for the Indonesian Language2015
- Author(s)
  Devin Hoesen, Ryan Price, Puji Lestari Dessi, Koichi Shinoda
- Organizer
  日本音響学会2015年秋季研究発表会
- Place of Presentation
  会津大学
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] 活性化関数のパラメータ制御を用いた LSTM による音声認識2015
- Author(s)
  松山祐輔, Ryan Price, 篠田浩一
- Organizer
  日本音響学会2015年秋季研究発表会
- Place of Presentation
  会津大学
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] 音声認識のためのDeep Learning2015
- Author(s)
  篠田浩一
- Organizer
  第25回日本神経回路学会全国大会
- Place of Presentation
  電気通信大学
- Year and Date
  2015-09-02
- Related Report
  2015 Annual Research Report
- Invited
[Presentation] CNNから抽出した複数特徴量の統合に基づいた映像の意味インデクシング2015
- Author(s)
  福田竣, 井上中順, 篠田浩一
- Organizer
  第21回画像センシングシンポジウム (SSII)
- Place of Presentation
  パシフィコ横浜アネックスホール
- Year and Date
  2015-06-10
- Related Report
  2015 Annual Research Report
[Presentation] 統計的パターン認識のための中間表現2015
- Author(s)
  篠田浩一
- Organizer
  電子情報通信学会2015年3月SIP/AE/SP研究会
- Place of Presentation
  石垣島ホテルミヤヒラ
- Year and Date
  2015-03-02
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] A new speech recognition paradigm based on deep learning2015
- Author(s)
  Koichi Shinoda
- Organizer
  APSIPA distinguished lecture
- Place of Presentation
  University of Science, VNU-HCM(ベトナム)
- Year and Date
  2015-01-15
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] Speaker Adaptation of Deep Neural Networks Usnig a Hierarchy of Output Layers2014
- Author(s)
  Ryan Price, Kenichi Iso, Koichi Shinoda
- Organizer
  IEEE Spoken Language Technology (SLT) Workshop
- Place of Presentation
  South Lake Tahoe (米国)
- Year and Date
  2014-12-07 – 2014-12-10
- Related Report
  2014 Annual Research Report
[Presentation] TokyoTech-Waseda at TRECVID 20142014
- Author(s)
  Nakamasa Inoue, Zhuolin Liang, Mengxi Lin, Tran Hai Dang, Koichi Shinoda, Zhang Xuefeng, Kazuya Ueki
- Organizer
  NIST TRECVID workshop 2014
- Place of Presentation
  セントラルフロリダ大学(米国)
- Year and Date
  2014-11-10 – 2014-11-12
- Related Report
  2014 Annual Research Report
[Presentation] Deep Learningによる新しい音声認識パラダイム2014
- Author(s)
  篠田浩一
- Organizer
  日本神経回路学会主催セミナー「Deep Learningが拓く世界」
- Place of Presentation
  京都大学東京オフィス(品川)
- Year and Date
  2014-08-26
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] Combining Deep Speaker Specific Representations with GMM-SVM for Speaker Verification2013
- Author(s)
  Ryan Price, Sangeeta Biswas, Koichi Shinoda
- Organizer
  INTERSPEECH2013
- Place of Presentation
  Lyon, France
- Related Report
  2013 Annual Research Report

Speech information processing using deep generative models and their factorization

Principal Investigator

Shinoda Koichi 東京工業大学, 情報理工学(系)研究科, 教授 (10343097)

¥16,900,000 (Direct Cost: ¥13,000,000、Indirect Cost: ¥3,900,000)

Report

Research Products

[Journal Article] Wise Teachers Train Better DNN Acoustic Models2016

Author(s)

Journal Title

DOI

NAID

Related Report

[Presentation] 音声・画像・映像におけるDeep Learningを用いたパターン認識2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] A DNN-Based ASR System for the Indonesian Language2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 活性化関数のパラメータ制御を用いた LSTM による音声認識2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声認識のためのDeep Learning2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] CNNから抽出した複数特徴量の統合に基づいた映像の意味インデクシング2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 統計的パターン認識のための中間表現2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] A new speech recognition paradigm based on deep learning2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Speaker Adaptation of Deep Neural Networks Usnig a Hierarchy of Output Layers2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] TokyoTech-Waseda at TRECVID 20142014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Deep Learningによる新しい音声認識パラダイム2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Combining Deep Speaker Specific Representations with GMM-SVM for Speaker Verification2013

Author(s)

Organizer

Place of Presentation

Related Report