2006 Fiscal Year Final Research Report Summary

Automatic Speech Recognition and Understanding of Lectures and Discussions for Effective Multi-media Archiving

Research Project

Project/Area Number	16200011
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Kyoto University
Principal Investigator	KAWAHARA Tatsuya Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	MINOH Michihiko Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (70166099) FURUI Sadaoki Tokyo Institute of Technology, Department of Computer Science, Professor, 情報理工学研究科, 教授 (90293076) AKITA Yuya Academic Center for Computing and Media Studies, Assistant Professor, 学術情報メディアセンター, 助手 (90402742)
Project Period (FY)	2004 – 2006
Keywords	Speech processing / Speech recognition / Speech archives / Spontaneous speech / Meta-data extraction / Oral presentation / Lecture / Discussion
Research Abstract	We investigated automatic speech recognition and post-processing of the transcripts of oral presentations at academic meetings, lectures at universities, and discussions on TV programs and parliaments. In these kinds of spontaneous speech, there is a large variation in pronunciation and a variety of colloquial expressions. Thus, elaborate modeling and robust statistical training are necessary for these issues. Moreover, since there is a wide variety of topics and vocabularies, it is also necessary to adapt the lexicon and language model to each lecture or discussion. For this purpose, we conducted the following studies. -Generalized statistical modeling of pronunciation variation -Transformation of language model to spoken-style based on statistical machine translation framework -Language model adaptation based on PLSA of topics and speakers -Language model adaptation using slide information -Language model adaptation based on topic segmentation of meetings The transcripts (speech recognition results) of spontaneous speech are not appropriate for archiving as they are. It is necessary to clean disfluencies and colloquial expressions and to mark sentence boundaries. Thus, we conducted the following studies. -Sentence boundary detection using dependency structure analysis -Clause boundary detection using local syntactic dependency -Detection of quotations and inserted clauses -Detection and correction of self-repairs We also conducted following studies for effective indexing of speech archives. -Indexing of key sentences in oral presentations -Alignment of utterances with slides used in lectures

Research Products
(14 results)

All 2006 2005 2004

All Journal Article (12 results) Book (1 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005
- Author(s)
  M.Nishida, T.Kawahara
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.13, No.4
  
  Pages: 583-592
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005
- Author(s)
  Y.Akita, T.Kawahara
- Journal Title
  
  IEICE Trans. Vol. E88-D, No.3
  
  Pages: 439-445
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル.2005
- Author(s)
  秋田祐哉, 河原達也
- Journal Title
  
  電子情報通信学会論文誌 Vol.J88-DII, No.9
  
  Pages: 1780-1789
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化.2005
- Author(s)
  下岡和也, 内元清貴, 河原達也, 井佐原均
- Journal Title
  
  自然言語処理 Vol.12, No.3
  
  Pages: 3-17
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005
- Author(s)
  M.Nishida, T.Kawahara.
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.13, No.4
  
  Pages: 583-592
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005
- Author(s)
  Y.Akita, T.Kawahara.
- Journal Title
  
  IEICE Trans. Vol.E88-D, No.3
  
  Pages: 439-445
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Generalized Statistical Modeling of Pronunciation Variations for Spontaneous Speech Recognition.2005
- Author(s)
  Y.Akita, T.Kawahara.
- Journal Title
  
  IEICE Trans. Information and Systems. Vol.J88-DII, No.9
  
  Pages: 1780-1789
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Interaction between Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese.2005
- Author(s)
  K.Shitaoka, K.Uchimoto, T.Kawahara, H.Isahara.
- Journal Title
  
  Journal of Natural Language Processing. Vol.12, No.3
  
  Pages: 3-17
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004
- Author(s)
  T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 409-419
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004
- Author(s)
  H.Nanjo, T.Kawahara
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 391-400
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004
- Author(s)
  T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo.
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 409-419
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004
- Author(s)
  H.Nanjo, T.Kawahara.
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 391-400
- Description
  「研究成果報告書概要(欧文)」より
[Book] 音声対話システム2006
- Author(s)
  河原達也, 荒木雅弘
- Total Pages
  208
- Publisher
  オーム社
- Description
  「研究成果報告書概要(和文)」より
[Patent(Industrial Property Rights)] 発話区間検出装置、そのためのコンピュータプログラム及び記録媒体2005
- Inventor(s)
  河原達也, 木田祐介
- Industrial Property Rights Holder
  京都大学
- Industrial Property Number
  特願2005-197804
- Filing Date
  2005-07-06
- Description
  「研究成果報告書概要(和文)」より

2006 Fiscal Year Final Research Report Summary

Automatic Speech Recognition and Understanding of Lectures and Discussions for Effective Multi-media Archiving

Principal Investigator

KAWAHARA Tatsuya Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)

Research Products

[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

Author(s)

Journal Title

Description

[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

Author(s)

Journal Title

Description

[Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル.2005

Author(s)

Journal Title

Description

[Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化.2005

Author(s)

Journal Title

Description

[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

Author(s)

Journal Title

Description

[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

Author(s)

Journal Title

Description

[Journal Article] Generalized Statistical Modeling of Pronunciation Variations for Spontaneous Speech Recognition.2005

Author(s)

Journal Title

Description

[Journal Article] Interaction between Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese.2005

Author(s)

Journal Title

Description

[Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004

Author(s)

Journal Title

Description

[Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004

Author(s)

Journal Title

Description

[Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004

Author(s)

Journal Title

Description

[Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004

Author(s)

Journal Title

Description

[Book] 音声対話システム2006

Author(s)

Total Pages

Publisher

Description

[Patent(Industrial Property Rights)] 発話区間検出装置、そのためのコンピュータプログラム及び記録媒体2005

Inventor(s)

Industrial Property Rights Holder

Industrial Property Number

Filing Date

Description