• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Automatic speech recognition based on semi-autonomous learning for captioning lectures

Research Project

Project/Area Number 16H02847
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Perceptual information processing
Research InstitutionKyoto University

Principal Investigator

Kawahara Tatsuya  京都大学, 情報学研究科, 教授 (00234104)

Co-Investigator(Kenkyū-buntansha) 秋田 祐哉  京都大学, 経済学研究科, 准教授 (90402742)
Research Collaborator Hirose Youko  
Project Period (FY) 2016-04-01 – 2019-03-31
Project Status Completed (Fiscal Year 2018)
Budget Amount *help
¥16,250,000 (Direct Cost: ¥12,500,000、Indirect Cost: ¥3,750,000)
Fiscal Year 2018: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Fiscal Year 2017: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Fiscal Year 2016: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000)
Keywords音声認識 / コンテンツ・アーカイブ / 機械学習 / 字幕付与 / 情報保障
Outline of Final Research Achievements

We have proposed a new end-to-end framework of speech recognition that directly converts speech signal to a word sequence. It is demonstrated to achieve higher accuracy with a drastically faster speed compared with the conventional systems. We have also developed a captioning system based on the server-based speech recognition system, and also a speech recognition package for PC which is integrated with the captioning software IPtalk widely used in Japan. The software is freely open to the public.

Academic Significance and Societal Importance of the Research Achievements

障害者差別解消法の施行に伴い、講義や講演において聴覚障害者に対する情報保障、すなわち字幕付与が求められているが、現状では量と質の両方において十分でない。これを支援するための音声認識技術の研究開発を行った。新たな深層学習に基づくモデルを導入することで、認識精度と速度の両方で大きな改善が得られた。サーバベースで音声ファイルに字幕を付与するシステム(http://caption.ist.i.kyoto-u.ac.jp/)に加えて、パソコン要約筆記で一般的に用いられているIPtalkにも音声認識の組込みを行い、一般公開した。また、『聴覚障害者のための字幕付与技術』シンポジウムを開催した。

Report

(4 results)
  • 2018 Annual Research Report   Final Research Report ( PDF )
  • 2017 Annual Research Report
  • 2016 Annual Research Report
  • Research Products

    (29 results)

All 2019 2018 2017 2016 Other

All Journal Article (10 results) (of which Int'l Joint Research: 4 results,  Peer Reviewed: 10 results,  Open Access: 6 results) Presentation (15 results) (of which Int'l Joint Research: 11 results,  Invited: 4 results) Remarks (4 results)

  • [Journal Article] Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition2019

    • Author(s)
      K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara
    • Journal Title

      IEEE/ACM Trans. Audio, Speech & Language Processing

      Volume: 27 Issue: 5 Pages: 960-971

    • DOI

      10.1109/taslp.2019.2907015

    • NAID

      120006621539

    • Related Report
      2018 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] A Dialogue Behavior Control Model for Expressing a Characters of Humanoid Robots2018

    • Author(s)
      山本賢太・井上昂治・中村静・高梨克也・河原達也
    • Journal Title

      Transactions of the Japanese Society for Artificial Intelligence

      Volume: 33 Issue: 5 Pages: C-I37_1-9

    • DOI

      10.1527/tjsai.C-I37

    • NAID

      130007481111

    • ISSN
      1346-0714, 1346-8030
    • Year and Date
      2018-09-01
    • Related Report
      2018 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening2018

    • Author(s)
      M.Mirzaei, K.Meshgi, and T.Kawahara
    • Journal Title

      Computer Speech and Language

      Volume: 49 Pages: 17-36

    • DOI

      10.1016/j.csl.2017.11.001

    • NAID

      120006605393

    • Related Report
      2018 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue2018

    • Author(s)
      K.Inoue, D.Lala, K.Takanashi, and T.Kawahara
    • Journal Title

      APSIPA Trans. Signal & Information Processing

      Volume: 7-e9 Issue: 1 Pages: 1-16

    • DOI

      10.1017/atsip.2018.11

    • Related Report
      2018 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms2018

    • Author(s)
      Y.Bando, K.Itoyama, M.Konyo, S.Tadokoro, K.Nakadai, K.Yoshii, T.Kawahara, and H.G.Okuno
    • Journal Title

      IEEE/ACM Trans. Audio, Speech & Language Processing

      Volume: 26 Issue: 2 Pages: 215-230

    • DOI

      10.1109/taslp.2017.2772340

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model2018

    • Author(s)
      井上昂治, Divesh Lala, 吉井和佳, 高梨克也, 河原達也
    • Journal Title

      Transactions of the Japanese Society for Artificial Intelligence

      Volume: 33 Issue: 1 Pages: DSH-F_1-12

    • DOI

      10.1527/tjsai.DSH-F

    • NAID

      130006302231

    • ISSN
      1346-0714, 1346-8030
    • Related Report
      2017 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill2017

    • Author(s)
      M.Mirzaei, K.Meshgi, Y.Akita, and T.Kawahara
    • Journal Title

      ReCALL Journal

      Volume: 29 Issue: 2 Pages: 178-199

    • DOI

      10.1017/s0958344017000039

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning2017

    • Author(s)
      R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E100.D Issue: 9 Pages: 2174-2182

    • DOI

      10.1587/transinf.2017EDP7019

    • NAID

      130006038443

    • ISSN
      0916-8532, 1745-1361
    • Related Report
      2017 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems' hypotheses2016

    • Author(s)
      S.Li, Y.Akita, and T.Kawahara
    • Journal Title

      IEEE/ACM Trans. Audio, Speech & Language Processing

      Volume: 24 Issue: 9 Pages: 1524-1534

    • DOI

      10.1109/taslp.2016.2562505

    • NAID

      120006027087

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Generating a Variety of Backchannel Forms Based on Linguistic and Prosodic Features for Attentive Listening Agents2016

    • Author(s)
      山口貴史・井上昂治・吉野幸一郎・高梨克也・Nigel G. Ward・河原達也
    • Journal Title

      Transactions of the Japanese Society for Artificial Intelligence

      Volume: 31 Issue: 4 Pages: C-G31_1-10

    • DOI

      10.1527/tjsai.C-G31

    • NAID

      130005254929

    • ISSN
      1346-0714, 1346-8030
    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Acoustic-to-word attention-based model complemented with character-level CTC-based model2018

    • Author(s)
      S.Ueno, H.Inaguma, M.Mimura, and T.Kawahara
    • Organizer
      Proc. IEEE-ICASSP
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] An end-to-end approach to joint social signal detection and automatic speech recognition2018

    • Author(s)
      H.Inaguma, M.Mimura, K.Inoue, K.Yoshii, and T.Kawahara
    • Organizer
      Proc. IEEE-ICASSP
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition2018

    • Author(s)
      M.Mimura, S.Ueno, H.Inaguma, S.Sakai, and T.Kawahara
    • Organizer
      Proc. IEEE Spoken Language Technology Workshop (SLT)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Improving OOV detection and resolution with external language models in acoustic-to-word ASR2018

    • Author(s)
      H.Inaguma, M.Mimura, S.Sakai, and T.Kawahara
    • Organizer
      Proc. IEEE Spoken Language Technology Workshop (SLT)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Spoken dialogue system for a human-like conversational robot ERICA2018

    • Author(s)
      T.Kawahara
    • Organizer
      Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS)
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Semi-supervised ensemble DNN acoustic model training2017

    • Author(s)
      S.Li, X.Lu, S.Sakai, M.Mimura, and T.Kawahara
    • Organizer
      IEEE-ICASSP
    • Place of Presentation
      米国・ニューオーリンズ
    • Year and Date
      2017-03-05
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Effective articulatory modeling for pronunciation error detection of L2 learner without non-native training data2017

    • Author(s)
      R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang
    • Organizer
      IEEE-ICASSP
    • Place of Presentation
      米国・ニューオーリンズ
    • Year and Date
      2017-03-05
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Social signal detection in spontaneous dialogue using bidirectional LSTM-CTC2017

    • Author(s)
      H.Inaguma, K.Inoue, M.Mimura, and T.Kawahara
    • Organizer
      INTERSPEECH
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Listening difficulty detection to foster second language listening with the partial and synchronized caption system2017

    • Author(s)
      M.Mirzaei, K.Meshgi, and T.Kawahara
    • Organizer
      EUROCALL
    • Related Report
      2017 Annual Research Report
  • [Presentation] Modeling difficulties of second language learners using speech technology2017

    • Author(s)
      T.Kawahara
    • Organizer
      Seoul International Conference on Speech Sciences (SICSS)
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] Automatic meeting transcription system for the Japanese Parliament (Diet)2017

    • Author(s)
      T.Kawahara
    • Organizer
      APSIPA ASC
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] What makes a quality transcript in Parliamentary reporting2017

    • Author(s)
      T.Kawahara
    • Organizer
      Intersteno
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] Multi-lingual and multi-task DNN learning for articulatory error detection2016

    • Author(s)
      R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang
    • Organizer
      APSIPA ASC
    • Place of Presentation
      韓国・済州
    • Year and Date
      2016-12-13
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Prediction and generation of backchannel form for attentive listening systems2016

    • Author(s)
      T.Kawahara, T.Yamaguchi, K.Inoue, K.Takanashi, and N.Ward
    • Organizer
      INTERSPEECH
    • Place of Presentation
      米国・サンフランシスコ
    • Year and Date
      2016-09-08
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Leveraging automatic speech recognition errors to detect challenging speech segments in TED talks2016

    • Author(s)
      M.Mirzaei, K.Meshgi, and T.Kawahara
    • Organizer
      EUROCALL
    • Place of Presentation
      キプロス・リマソル
    • Year and Date
      2016-08-24
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Remarks] 音声認識技術を用いた字幕付与支援プロジェクト

    • URL

      http://www.sap.ist.i.kyoto-u.ac.jp/jimaku/

    • Related Report
      2018 Annual Research Report
  • [Remarks] 音声認識を用いた自動字幕作成システム

    • URL

      http://caption.ist.i.kyoto-u.ac.jp/

    • Related Report
      2018 Annual Research Report 2017 Annual Research Report 2016 Annual Research Report
  • [Remarks] 音声認識を用いた字幕作成支援

    • URL

      http://www.sap.ist.i.kyoto-u.ac.jp/jimaku/

    • Related Report
      2017 Annual Research Report
  • [Remarks] 音声認識技術を用いた字幕付与支援

    • URL

      http://sap.ist.i.kyoto-u.ac.jp/jimaku/

    • Related Report
      2016 Annual Research Report

URL: 

Published: 2016-04-21   Modified: 2020-03-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi