2017 Fiscal Year Annual Research Report

半自律的な音声認識による講演・講義への字幕付与

Research Project

Project/Area Number	16H02847
Research Institution	Kyoto University
Principal Investigator	河原達也京都大学, 情報学研究科, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	秋田祐哉京都大学, 経済学研究科, 准教授 (90402742)
Project Period (FY)	2016-04-01 – 2019-03-31
Keywords	音声認識 / コンテンツ・アーカイブ / 機械学習 / 字幕付与
Outline of Annual Research Achievements	放送大学の講義と学会の講演を主な対象として、音声認識の方式について様々な研究を進めながら、字幕付与を行うシステムの改善を行った。 (1) ニューラルネットワークに基づく音響モデルに関して、CTC(Connectionist Temporal Classification)やAttentionモデルなどのEnd-to-Endの方法を検討した。これらの方法は、従来のDNN-HMMにとってかわるものである。また、フィラーや言い淀みなどのイベントをCTCの枠組みで統合的に検出する方法も検討した。 (2) ニューラルネットワークに基づく言語モデルに関して、単語を単位とするAttentionモデルの枠組みでEnd-to-Endモデルとして実現する方式を検討した。この方式は、音響モデルと言語モデルを一体的に構成・最適化するもので、従来の階層的な方式と比べて、非常に単純なアーキテクチャーで、25倍以上の高速化を実現できる。認識精度についても種々の検討を行った結果、従来方式を上回る水準に到達しつつある。 (3) 字幕付与システム(http://caption.ist.i.kyoto-u.ac.jp/)を一般に公開し、試験運用を行った。本システムは放送大学のオンライン講義の字幕付与で使用された他、政策研究大学院大学や国立国語研究所へ提供した。 (4) 聴覚障害者の情報保障のためにリアルタイムで字幕を付与する方法を引き続き研究した。情報処理学会の複数の研究会(SIG-SLP, SIG-AAC)において、講演の字幕付与を実施した。 (5) 英語の講演コンテンツを対象として、聞き取りが困難な箇所に選択的に字幕付与を行うことでリスニング訓練を行うシステムについて研究した。
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 論文発表に加えて、システムの一般公開や他研究機関への提供も行った。
Strategy for Future Research Activity	音声認識の方式について研究を継続しながら、システムの試験運用と改善を行う。

Research Products
(11 results)

All 2018 2017 Other

All Journal Article (4 results) (of which Peer Reviewed: 4 results, Open Access: 1 results) Presentation (5 results) (of which Int'l Joint Research: 1 results, Invited: 3 results) Remarks (2 results)

[Journal Article] Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms2018
- Author(s)
  Y.Bando, K.Itoyama, M.Konyo, S.Tadokoro, K.Nakadai, K.Yoshii, T.Kawahara, and H.G.Okuno
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech & Language Processing
  
  Volume: 26 Pages: 215--230
- DOI
  http://dx.doi.org/10.1109/TASLP.2017.2772340
- Peer Reviewed
[Journal Article] Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill2017
- Author(s)
  M.Mirzaei, K.Meshgi, Y.Akita, and T.Kawahara
- Journal Title
  
  ReCALL Journal
  
  Volume: 29 Pages: 178--199
- DOI
  https://doi.org/10.1017/S0958344017000039
- Peer Reviewed
[Journal Article] Articulatory modeling for pronunciation error detection without non-native training data based on DNN transfer learning2017
- Author(s)
  R.Duan, T.Kawahara, M.Dantsuji, and J.Zhang
- Journal Title
  
  IEICE Transation
  
  Volume: E100-D Pages: 2174--2182
- DOI
  https://doi.org/10.1587/transinf.2017EDP7019
- Peer Reviewed
[Journal Article] 潜在キャラクタモデルによる聞き手のふるまいに基づく対話エンゲージメントの推定2017
- Author(s)
  井上昂治, Divesh Lala, 吉井和佳, 高梨克也, 河原達也
- Journal Title
  
  人工知能学会論文誌
  
  Volume: 33 Pages: DSH--F_1--12
- DOI
  https://doi.org/10.1527/tjsai.DSH-F
- Peer Reviewed / Open Access
[Presentation] Social signal detection in spontaneous dialogue using bidirectional LSTM-CTC2017
- Author(s)
  H.Inaguma, K.Inoue, M.Mimura, and T.Kawahara
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Listening difficulty detection to foster second language listening with the partial and synchronized caption system2017
- Author(s)
  M.Mirzaei, K.Meshgi, and T.Kawahara
- Organizer
  EUROCALL
[Presentation] Modeling difficulties of second language learners using speech technology2017
- Author(s)
  T.Kawahara
- Organizer
  Seoul International Conference on Speech Sciences (SICSS)
- Invited
[Presentation] Automatic meeting transcription system for the Japanese Parliament (Diet)2017
- Author(s)
  T.Kawahara
- Organizer
  APSIPA ASC
- Invited
[Presentation] What makes a quality transcript in Parliamentary reporting2017
- Author(s)
  T.Kawahara
- Organizer
  Intersteno
- Invited
[Remarks] 音声認識を用いた自動字幕作成システム
- URL
  http://caption.ist.i.kyoto-u.ac.jp/
[Remarks] 音声認識を用いた字幕作成支援
- URL
  http://www.sap.ist.i.kyoto-u.ac.jp/jimaku/

2017 Fiscal Year Annual Research Report

半自律的な音声認識による講演・講義への字幕付与

Principal Investigator

河原 達也 京都大学, 情報学研究科, 教授 (00234104)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms2018

Author(s)

Journal Title

DOI

[Journal Article] Partial and synchronized captioning: A new tool to assist learners in developing second language listening skill2017

Author(s)

Journal Title

DOI

[Journal Article] Articulatory modeling for pronunciation error detection without non-native training data based on DNN transfer learning2017

Author(s)

Journal Title

DOI

[Journal Article] 潜在キャラクタモデルによる聞き手のふるまいに基づく 対話エンゲージメントの推定2017

Author(s)

Journal Title

DOI

[Presentation] Social signal detection in spontaneous dialogue using bidirectional LSTM-CTC2017

Author(s)

Organizer

[Presentation] Listening difficulty detection to foster second language listening with the partial and synchronized caption system2017

Author(s)

Organizer

[Presentation] Modeling difficulties of second language learners using speech technology2017

Author(s)

Organizer

[Presentation] Automatic meeting transcription system for the Japanese Parliament (Diet)2017

Author(s)

Organizer

[Presentation] What makes a quality transcript in Parliamentary reporting2017

Author(s)

Organizer

[Remarks] 音声認識を用いた自動字幕作成システム

URL

[Remarks] 音声認識を用いた字幕作成支援

URL

河原達也京都大学, 情報学研究科, 教授 (00234104)

[Journal Article] 潜在キャラクタモデルによる聞き手のふるまいに基づく対話エンゲージメントの推定2017