• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Speech information processing using deep generative models and their factorization

Research Project

Project/Area Number 25280058
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypePartial Multi-year Fund
Section一般
Research Field Perceptual information processing
Research InstitutionTokyo Institute of Technology

Principal Investigator

Shinoda Koichi  東京工業大学, 情報理工学(系)研究科, 教授 (10343097)

Co-Investigator(Kenkyū-buntansha) IWANO Koji  東京都市大学, メディア学部, 教授 (90323823)
SHINOZAKI Takahiro  東京工業大学, 大学院総合理工学研究科, 准教授 (80447903)
Project Period (FY) 2013-04-01 – 2016-03-31
Project Status Completed (Fiscal Year 2015)
Budget Amount *help
¥16,900,000 (Direct Cost: ¥13,000,000、Indirect Cost: ¥3,900,000)
Fiscal Year 2015: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2014: ¥6,500,000 (Direct Cost: ¥5,000,000、Indirect Cost: ¥1,500,000)
Fiscal Year 2013: ¥6,240,000 (Direct Cost: ¥4,800,000、Indirect Cost: ¥1,440,000)
Keywords音声情報処理 / 深層学習 / 話者適応 / マルチモーダル処理
Outline of Final Research Achievements

In speech recognition, it is important to train an accurate deep neural network (DNN) acoustic model from a large amount speech data from many speakers. In this study, we developed a framework to improve accuracy of the DNN acoustic model by factorizing speech data into phoneme and speaker elements. First we developed a speaker recognition method using deep Siamese network in which two DNNs which share its part. Second, we applied a DNN with a hierarchical phonetic structure to speaker adaptation. Third, we developed a speaker-adaptive training method where we utilized a student-teacher learning framework using soft targets. We improved speaker verification and speech recognition performance. We also studied DNN implementation and DNN structure design.

Report

(4 results)
  • 2015 Annual Research Report   Final Research Report ( PDF )
  • 2014 Annual Research Report
  • 2013 Annual Research Report
  • Research Products

    (12 results)

All 2016 2015 2014 2013

All Journal Article (1 results) (of which Peer Reviewed: 1 results,  Open Access: 1 results) Presentation (11 results) (of which Invited: 5 results)

  • [Journal Article] Wise Teachers Train Better DNN Acoustic Models2016

    • Author(s)
      R. Price, K. Iso, K. Shinoda
    • Journal Title

      EURASIP Journal on Audio Speech and Music Processing

      Volume: 2016 Issue: 1 Pages: 1-19

    • DOI

      10.1186/s13636-016-0088-7

    • NAID

      120006582513

    • Related Report
      2015 Annual Research Report
    • Peer Reviewed / Open Access
  • [Presentation] 音声・画像・映像におけるDeep Learningを用いたパターン認識2015

    • Author(s)
      篠田浩一
    • Organizer
      人工知能学会AIチャレンジ研究会
    • Place of Presentation
      慶応大学
    • Year and Date
      2015-11-12
    • Related Report
      2015 Annual Research Report
    • Invited
  • [Presentation] A DNN-Based ASR System for the Indonesian Language2015

    • Author(s)
      Devin Hoesen, Ryan Price, Puji Lestari Dessi, Koichi Shinoda
    • Organizer
      日本音響学会2015年秋季研究発表会
    • Place of Presentation
      会津大学
    • Year and Date
      2015-09-16
    • Related Report
      2015 Annual Research Report
  • [Presentation] 活性化関数のパラメータ制御を用いた LSTM による音声認識2015

    • Author(s)
      松山祐輔, Ryan Price, 篠田浩一
    • Organizer
      日本音響学会2015年秋季研究発表会
    • Place of Presentation
      会津大学
    • Year and Date
      2015-09-16
    • Related Report
      2015 Annual Research Report
  • [Presentation] 音声認識のためのDeep Learning2015

    • Author(s)
      篠田浩一
    • Organizer
      第25回 日本神経回路学会 全国大会
    • Place of Presentation
      電気通信大学
    • Year and Date
      2015-09-02
    • Related Report
      2015 Annual Research Report
    • Invited
  • [Presentation] CNNから抽出した複数特徴量の統合に基づいた映像の意味インデクシング2015

    • Author(s)
      福田 竣, 井上 中順, 篠田 浩一
    • Organizer
      第21回画像センシングシンポジウム (SSII)
    • Place of Presentation
      パシフィコ横浜アネックスホール
    • Year and Date
      2015-06-10
    • Related Report
      2015 Annual Research Report
  • [Presentation] 統計的パターン認識のための中間表現2015

    • Author(s)
      篠田浩一
    • Organizer
      電子情報通信学会2015年3月SIP/AE/SP研究会
    • Place of Presentation
      石垣島ホテルミヤヒラ
    • Year and Date
      2015-03-02
    • Related Report
      2014 Annual Research Report
    • Invited
  • [Presentation] A new speech recognition paradigm based on deep learning2015

    • Author(s)
      Koichi Shinoda
    • Organizer
      APSIPA distinguished lecture
    • Place of Presentation
      University of Science, VNU-HCM(ベトナム)
    • Year and Date
      2015-01-15
    • Related Report
      2014 Annual Research Report
    • Invited
  • [Presentation] Speaker Adaptation of Deep Neural Networks Usnig a Hierarchy of Output Layers2014

    • Author(s)
      Ryan Price, Kenichi Iso, Koichi Shinoda
    • Organizer
      IEEE Spoken Language Technology (SLT) Workshop
    • Place of Presentation
      South Lake Tahoe (米国)
    • Year and Date
      2014-12-07 – 2014-12-10
    • Related Report
      2014 Annual Research Report
  • [Presentation] TokyoTech-Waseda at TRECVID 20142014

    • Author(s)
      Nakamasa Inoue, Zhuolin Liang, Mengxi Lin, Tran Hai Dang, Koichi Shinoda, Zhang Xuefeng, Kazuya Ueki
    • Organizer
      NIST TRECVID workshop 2014
    • Place of Presentation
      セントラルフロリダ大学(米国)
    • Year and Date
      2014-11-10 – 2014-11-12
    • Related Report
      2014 Annual Research Report
  • [Presentation] Deep Learningによる新しい音声認識パラダイム2014

    • Author(s)
      篠田浩一
    • Organizer
      日本神経回路学会主催セミナー「Deep Learningが拓く世界」
    • Place of Presentation
      京都大学東京オフィス(品川)
    • Year and Date
      2014-08-26
    • Related Report
      2014 Annual Research Report
    • Invited
  • [Presentation] Combining Deep Speaker Specific Representations with GMM-SVM for Speaker Verification2013

    • Author(s)
      Ryan Price, Sangeeta Biswas, Koichi Shinoda
    • Organizer
      INTERSPEECH2013
    • Place of Presentation
      Lyon, France
    • Related Report
      2013 Annual Research Report

URL: 

Published: 2013-05-21   Modified: 2019-07-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi