Automatic speech recognition based on semi-autonomous learning for captioning lectures

Research Project

Project/Area Number	16H02847
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perceptual information processing
Research Institution	Kyoto University
Principal Investigator	Kawahara Tatsuya 京都大学, 情報学研究科, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	秋田祐哉京都大学, 経済学研究科, 准教授 (90402742)
Research Collaborator	Hirose Youko
Project Period (FY)	2016-04-01 – 2019-03-31
Project Status	Completed (Fiscal Year 2018)
Budget Amount *help	¥16,250,000 (Direct Cost: ¥12,500,000、Indirect Cost: ¥3,750,000) Fiscal Year 2018: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000) Fiscal Year 2017: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000) Fiscal Year 2016: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000)
Keywords	音声認識 / コンテンツ・アーカイブ / 機械学習 / 字幕付与 / 情報保障
Outline of Final Research Achievements	We have proposed a new end-to-end framework of speech recognition that directly converts speech signal to a word sequence. It is demonstrated to achieve higher accuracy with a drastically faster speed compared with the conventional systems. We have also developed a captioning system based on the server-based speech recognition system, and also a speech recognition package for PC which is integrated with the captioning software IPtalk widely used in Japan. The software is freely open to the public.
Academic Significance and Societal Importance of the Research Achievements	障害者差別解消法の施行に伴い、講義や講演において聴覚障害者に対する情報保障、すなわち字幕付与が求められているが、現状では量と質の両方において十分でない。これを支援するための音声認識技術の研究開発を行った。新たな深層学習に基づくモデルを導入することで、認識精度と速度の両方で大きな改善が得られた。サーバベースで音声ファイルに字幕を付与するシステム(http://caption.ist.i.kyoto-u.ac.jp/)に加えて、パソコン要約筆記で一般的に用いられているIPtalkにも音声認識の組込みを行い、一般公開した。また、『聴覚障害者のための字幕付与技術』シンポジウムを開催した。

Report

(4 results)

2018 Annual Research Report Final Research Report ( PDF )
2017 Annual Research Report
2016 Annual Research Report

Research Products
(29 results)

All 2019 2018 2017 2016 Other

All Journal Article (10 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 10 results, Open Access: 6 results) Presentation (15 results) (of which Int'l Joint Research: 11 results, Invited: 4 results) Remarks (4 results)

[Journal Article] Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition2019
- Author(s)
  K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech & Language Processing
  
  Volume: 27 Issue: 5 Pages: 960-971
- DOI
  10.1109/taslp.2019.2907015
- NAID
  120006621539
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] A Dialogue Behavior Control Model for Expressing a Characters of Humanoid Robots2018
- Author(s)
  山本賢太・井上昂治・中村静・高梨克也・河原達也
- Journal Title
  
  Transactions of the Japanese Society for Artificial Intelligence
  
  Volume: 33 Issue: 5 Pages: C-I37_1-9
- DOI
  10.1527/tjsai.C-I37
- NAID
  130007481111
- ISSN
  1346-0714, 1346-8030
- Year and Date
  2018-09-01
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening2018
- Author(s)
  M.Mirzaei, K.Meshgi, and T.Kawahara
- Journal Title
  
  Computer Speech and Language
  
  Volume: 49 Pages: 17-36
- DOI
  10.1016/j.csl.2017.11.001
- NAID
  120006605393
- Related Report
  2018 Annual Research Report
- Peer Reviewed
[Journal Article] Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue2018
- Author(s)
  K.Inoue, D.Lala, K.Takanashi, and T.Kawahara
- Journal Title
  
  APSIPA Trans. Signal & Information Processing
  
  Volume: 7-e9 Issue: 1 Pages: 1-16
- DOI
  10.1017/atsip.2018.11
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms2018
- Author(s)
  Y.Bando, K.Itoyama, M.Konyo, S.Tadokoro, K.Nakadai, K.Yoshii, T.Kawahara, and H.G.Okuno
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech & Language Processing
  
  Volume: 26 Issue: 2 Pages: 215-230
- DOI
  10.1109/taslp.2017.2772340
- Related Report
  2017 Annual Research Report
- Peer Reviewed
[Journal Article] Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model2018
- Author(s)
  井上昂治, Divesh Lala, 吉井和佳, 高梨克也, 河原達也
- Journal Title
  
  Transactions of the Japanese Society for Artificial Intelligence
  
  Volume: 33 Issue: 1 Pages: DSH-F_1-12
- DOI
  10.1527/tjsai.DSH-F
- NAID
  130006302231
- ISSN
  1346-0714, 1346-8030
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill2017
- Author(s)
  M.Mirzaei, K.Meshgi, Y.Akita, and T.Kawahara
- Journal Title
  
  ReCALL Journal
  
  Volume: 29 Issue: 2 Pages: 178-199
- DOI
  10.1017/s0958344017000039
- Related Report
  2017 Annual Research Report
- Peer Reviewed
[Journal Article] Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning2017
- Author(s)
  R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E100.D Issue: 9 Pages: 2174-2182
- DOI
  10.1587/transinf.2017EDP7019
- NAID
  130006038443
- ISSN
  0916-8532, 1745-1361
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems' hypotheses2016
- Author(s)
  S.Li, Y.Akita, and T.Kawahara
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech & Language Processing
  
  Volume: 24 Issue: 9 Pages: 1524-1534
- DOI
  10.1109/taslp.2016.2562505
- NAID
  120006027087
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Generating a Variety of Backchannel Forms Based on Linguistic and Prosodic Features for Attentive Listening Agents2016
- Author(s)
  山口貴史・井上昂治・吉野幸一郎・高梨克也・Nigel G. Ward・河原達也
- Journal Title
  
  Transactions of the Japanese Society for Artificial Intelligence
  
  Volume: 31 Issue: 4 Pages: C-G31_1-10
- DOI
  10.1527/tjsai.C-G31
- NAID
  130005254929
- ISSN
  1346-0714, 1346-8030
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Acoustic-to-word attention-based model complemented with character-level CTC-based model2018
- Author(s)
  S.Ueno, H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  Proc. IEEE-ICASSP
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] An end-to-end approach to joint social signal detection and automatic speech recognition2018
- Author(s)
  H.Inaguma, M.Mimura, K.Inoue, K.Yoshii, and T.Kawahara
- Organizer
  Proc. IEEE-ICASSP
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition2018
- Author(s)
  M.Mimura, S.Ueno, H.Inaguma, S.Sakai, and T.Kawahara
- Organizer
  Proc. IEEE Spoken Language Technology Workshop (SLT)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Improving OOV detection and resolution with external language models in acoustic-to-word ASR2018
- Author(s)
  H.Inaguma, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  Proc. IEEE Spoken Language Technology Workshop (SLT)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Spoken dialogue system for a human-like conversational robot ERICA2018
- Author(s)
  T.Kawahara
- Organizer
  Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Semi-supervised ensemble DNN acoustic model training2017
- Author(s)
  S.Li, X.Lu, S.Sakai, M.Mimura, and T.Kawahara
- Organizer
  IEEE-ICASSP
- Place of Presentation
  米国・ニューオーリンズ
- Year and Date
  2017-03-05
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Effective articulatory modeling for pronunciation error detection of L2 learner without non-native training data2017
- Author(s)
  R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang
- Organizer
  IEEE-ICASSP
- Place of Presentation
  米国・ニューオーリンズ
- Year and Date
  2017-03-05
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Social signal detection in spontaneous dialogue using bidirectional LSTM-CTC2017
- Author(s)
  H.Inaguma, K.Inoue, M.Mimura, and T.Kawahara
- Organizer
  INTERSPEECH
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Listening difficulty detection to foster second language listening with the partial and synchronized caption system2017
- Author(s)
  M.Mirzaei, K.Meshgi, and T.Kawahara
- Organizer
  EUROCALL
- Related Report
  2017 Annual Research Report
[Presentation] Modeling difficulties of second language learners using speech technology2017
- Author(s)
  T.Kawahara
- Organizer
  Seoul International Conference on Speech Sciences (SICSS)
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] Automatic meeting transcription system for the Japanese Parliament (Diet)2017
- Author(s)
  T.Kawahara
- Organizer
  APSIPA ASC
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] What makes a quality transcript in Parliamentary reporting2017
- Author(s)
  T.Kawahara
- Organizer
  Intersteno
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] Multi-lingual and multi-task DNN learning for articulatory error detection2016
- Author(s)
  R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang
- Organizer
  APSIPA ASC
- Place of Presentation
  韓国・済州
- Year and Date
  2016-12-13
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Prediction and generation of backchannel form for attentive listening systems2016
- Author(s)
  T.Kawahara, T.Yamaguchi, K.Inoue, K.Takanashi, and N.Ward
- Organizer
  INTERSPEECH
- Place of Presentation
  米国・サンフランシスコ
- Year and Date
  2016-09-08
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Leveraging automatic speech recognition errors to detect challenging speech segments in TED talks2016
- Author(s)
  M.Mirzaei, K.Meshgi, and T.Kawahara
- Organizer
  EUROCALL
- Place of Presentation
  キプロス・リマソル
- Year and Date
  2016-08-24
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Remarks] 音声認識技術を用いた字幕付与支援プロジェクト
- URL
  http://www.sap.ist.i.kyoto-u.ac.jp/jimaku/
- Related Report
  2018 Annual Research Report
[Remarks] 音声認識を用いた自動字幕作成システム
- URL
  http://caption.ist.i.kyoto-u.ac.jp/
- Related Report
  2018 Annual Research Report 2017 Annual Research Report 2016 Annual Research Report
[Remarks] 音声認識を用いた字幕作成支援
- URL
  http://www.sap.ist.i.kyoto-u.ac.jp/jimaku/
- Related Report
  2017 Annual Research Report
[Remarks] 音声認識技術を用いた字幕付与支援
- URL
  http://sap.ist.i.kyoto-u.ac.jp/jimaku/
- Related Report
  2016 Annual Research Report

Automatic speech recognition based on semi-autonomous learning for captioning lectures

Principal Investigator

Kawahara Tatsuya 京都大学, 情報学研究科, 教授 (00234104)

¥16,250,000 (Direct Cost: ¥12,500,000、Indirect Cost: ¥3,750,000)

Report

Research Products

[Journal Article] Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition2019

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] A Dialogue Behavior Control Model for Expressing a Characters of Humanoid Robots2018

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening2018

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model2018

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill2017

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning2017

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] Semi-supervised acoustic model training by discriminative data selection from multiple ASR systems' hypotheses2016

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] Generating a Variety of Backchannel Forms Based on Linguistic and Prosodic Features for Attentive Listening Agents2016

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] Acoustic-to-word attention-based model complemented with character-level CTC-based model2018

Author(s)

Organizer

Related Report

[Presentation] An end-to-end approach to joint social signal detection and automatic speech recognition2018

Author(s)

Organizer

Related Report

[Presentation] Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition2018

Author(s)

Organizer

Related Report