2018 Fiscal Year Annual Research Report

Automatic speech recognition based on semi-autonomous learning for captioning lectures

Research Project

Project/Area Number	16H02847
Research Institution	Kyoto University
Principal Investigator	河原達也京都大学, 情報学研究科, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	秋田祐哉京都大学, 経済学研究科, 准教授 (90402742)
Project Period (FY)	2016-04-01 – 2019-03-31
Keywords	音声認識 / コンテンツ・アーカイブ / 機械学習 / 字幕付与
Outline of Annual Research Achievements	新たな音声認識のモデル・アルゴリズムについて研究するとともに、講演や講義を対象とした字幕付与のシステムの改善を行った。 (1) ニューラルネットワークに基づいて音響モデルと言語モデルを一体的にモデル化し、入力音声から認識結果の単語列を直接求めるEnd-to-End音声認識を実現した。特に単語を単位としたモデルを安定して学習する方式を提案し、従来の一般的な音声認識手法と比較して、高い認識精度を実現しながら、処理時間を1/30以下にできることを示した。 (2) 上記のEnd-to-End音声認識システムは語彙も含めて学習データに特化する問題があるので、新しいドメインに適応する様々な方法を検討した。特に、音声合成によって疑似的に学習用音声データを生成する方法を提案し、実現可能性を示した。 (3) 講演・講義の音声ファイルに字幕を付与するシステム(http://caption.ist.i.kyoto-u.ac.jp/)を引き続き試験運用した。本システムは、政策研究大学院大学や国立国語研究所などでも利用されている。 (4) 聴覚障害者の情報保障のためにリアルタイムで字幕を付与するソフトIPtalk(http://www.s-kurita.net/)に、本プロジェクトで開発してきた音声認識ソフトを統合して一般に公開した。本プロジェクト及びこのソフトの紹介を兼ねて、2018年12月に京都大学において『聴覚障害者のための字幕付与技術』シンポジウムを開催した。聴覚障害者や要約筆記者などを含めて143名の参加者があり、当該技術の展望について様々な意見交換を行った。
Research Progress Status	平成30年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	平成30年度が最終年度であるため、記入しない。

Research Products
(11 results)

All 2019 2018 Other

All Journal Article (4 results) (of which Peer Reviewed: 4 results, Open Access: 1 results) Presentation (5 results) (of which Int'l Joint Research: 5 results, Invited: 1 results) Remarks (2 results)

[Journal Article] Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition2019
- Author(s)
  K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech & Language Processing
  
  Volume: 27 Pages: (to appear)
- DOI
  https://doi.org/10.1109/TASLP.2019.2907015
- Peer Reviewed
[Journal Article] Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening2018
- Author(s)
  M.Mirzaei, K.Meshgi, and T.Kawahara
- Journal Title
  
  Computer Speech and Language
  
  Volume: 49 Pages: 17-36
- DOI
  https://doi.org/10.1016/j.csl.2017.11.001
- Peer Reviewed
[Journal Article] Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue2018
- Author(s)
  K.Inoue, D.Lala, K.Takanashi, and T.Kawahara
- Journal Title
  
  APSIPA Trans. Signal & Information Processing
  
  Volume: 7-e9 Pages: 1-16
- DOI
  https://doi.org/10.1017/ATSIP.2018.11
- Peer Reviewed / Open Access
[Journal Article] 人間型ロボットのキャラクタ表現のための対話の振る舞い制御モデル2018
- Author(s)
  山本賢太, 井上昂治, 中村静, 高梨克也, 河原達也
- Journal Title
  
  人工知能学会論文誌
  
  Volume: 33 Pages: C--I37_1--9
- DOI
  https://doi.org/10.1527/tjsai.C-I37
- Peer Reviewed
[Presentation] Acoustic-to-word attention-based model complemented with character-level CTC-based model2018
- Author(s)
  S.Ueno, H.Inaguma, M.Mimura, and T.Kawahara
- Organizer
  Proc. IEEE-ICASSP
- Int'l Joint Research
[Presentation] An end-to-end approach to joint social signal detection and automatic speech recognition2018
- Author(s)
  H.Inaguma, M.Mimura, K.Inoue, K.Yoshii, and T.Kawahara
- Organizer
  Proc. IEEE-ICASSP
- Int'l Joint Research
[Presentation] Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition2018
- Author(s)
  M.Mimura, S.Ueno, H.Inaguma, S.Sakai, and T.Kawahara
- Organizer
  Proc. IEEE Spoken Language Technology Workshop (SLT)
- Int'l Joint Research
[Presentation] Improving OOV detection and resolution with external language models in acoustic-to-word ASR2018
- Author(s)
  H.Inaguma, M.Mimura, S.Sakai, and T.Kawahara
- Organizer
  Proc. IEEE Spoken Language Technology Workshop (SLT)
- Int'l Joint Research
[Presentation] Spoken dialogue system for a human-like conversational robot ERICA2018
- Author(s)
  T.Kawahara
- Organizer
  Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS)
- Int'l Joint Research / Invited
[Remarks] 音声認識技術を用いた字幕付与支援プロジェクト
- URL
  http://www.sap.ist.i.kyoto-u.ac.jp/jimaku/
[Remarks] 音声認識を用いた自動字幕作成システム
- URL
  http://caption.ist.i.kyoto-u.ac.jp/

2018 Fiscal Year Annual Research Report

Automatic speech recognition based on semi-autonomous learning for captioning lectures

Principal Investigator

河原 達也 京都大学, 情報学研究科, 教授 (00234104)

Research Products

[Journal Article] Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition2019

Author(s)

Journal Title

DOI

[Journal Article] Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening2018

Author(s)

Journal Title

DOI

[Journal Article] Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue2018

Author(s)

Journal Title

DOI

[Journal Article] 人間型ロボットのキャラクタ表現のための対話の振る舞い制御モデル2018

Author(s)

Journal Title

DOI

[Presentation] Acoustic-to-word attention-based model complemented with character-level CTC-based model2018

Author(s)

Organizer

[Presentation] An end-to-end approach to joint social signal detection and automatic speech recognition2018

Author(s)

Organizer

[Presentation] Leveraging sequence-to-sequence speech synthesis for enhancing acoustic-to-word speech recognition2018

Author(s)

Organizer

[Presentation] Improving OOV detection and resolution with external language models in acoustic-to-word ASR2018

Author(s)

Organizer

[Presentation] Spoken dialogue system for a human-like conversational robot ERICA2018

Author(s)

Organizer

[Remarks] 音声認識技術を用いた字幕付与支援プロジェクト

URL

[Remarks] 音声認識を用いた自動字幕作成システム

URL

河原達也京都大学, 情報学研究科, 教授 (00234104)