Automatic Speech Recognition and Understanding of Lectures and Discussions for Effective Multi-media Archiving

Research Project

Project/Area Number	16200011
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Kyoto University
Principal Investigator	KAWAHARA Tatsuya Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	MINOH Michihiko Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (70166099) FURUI Sadaoki Tokyo Institute of Technology, Department of Computer Science, Professor, 情報理工学研究科, 教授 (90293076) AKITA Yuya Academic Center for Computing and Media Studies, Assistant Professor, 学術情報メディアセンター, 助手 (90402742)
Project Period (FY)	2004 – 2006
Project Status	Completed (Fiscal Year 2006)
Budget Amount *help	¥45,110,000 (Direct Cost: ¥34,700,000、Indirect Cost: ¥10,410,000) Fiscal Year 2006: ¥12,220,000 (Direct Cost: ¥9,400,000、Indirect Cost: ¥2,820,000) Fiscal Year 2005: ¥14,170,000 (Direct Cost: ¥10,900,000、Indirect Cost: ¥3,270,000) Fiscal Year 2004: ¥18,720,000 (Direct Cost: ¥14,400,000、Indirect Cost: ¥4,320,000)
Keywords	Speech processing / Speech recognition / Speech archives / Spontaneous speech / Meta-data extraction / Oral presentation / Lecture / Discussion
Research Abstract	We investigated automatic speech recognition and post-processing of the transcripts of oral presentations at academic meetings, lectures at universities, and discussions on TV programs and parliaments. In these kinds of spontaneous speech, there is a large variation in pronunciation and a variety of colloquial expressions. Thus, elaborate modeling and robust statistical training are necessary for these issues. Moreover, since there is a wide variety of topics and vocabularies, it is also necessary to adapt the lexicon and language model to each lecture or discussion. For this purpose, we conducted the following studies. -Generalized statistical modeling of pronunciation variation -Transformation of language model to spoken-style based on statistical machine translation framework -Language model adaptation based on PLSA of topics and speakers -Language model adaptation using slide information -Language model adaptation based on topic segmentation of meetings The transcripts (speech recognition results) of spontaneous speech are not appropriate for archiving as they are. It is necessary to clean disfluencies and colloquial expressions and to mark sentence boundaries. Thus, we conducted the following studies. -Sentence boundary detection using dependency structure analysis -Clause boundary detection using local syntactic dependency -Detection of quotations and inserted clauses -Detection and correction of self-repairs We also conducted following studies for effective indexing of speech archives. -Indexing of key sentences in oral presentations -Alignment of utterances with slides used in lectures

Report

(4 results)

2006 Annual Research Report Final Research Report Summary
2005 Annual Research Report
2004 Annual Research Report

Research Products
(33 results)

All 2007 2006 2005 2004

All Journal Article (30 results) Book (2 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] Out-of-domain utterance detection using classification confidences of multiple topics2007
- Author(s)
  I.R.Lane, T.Kawahara, T.Matsui, S.Nakamura
- Journal Title
  
  IEEE Trans. Audio, Speech ＆ Language Processing Vol. 15, No. 1
  
  Pages: 150-161
- NAID
  120002511372
- Related Report
  2006 Annual Research Report
[Journal Article] Intelligent transcription system based on spontaneous speech processing2007
- Author(s)
  T.Kawahara
- Journal Title
  
  Proc. Int'l Conference on Informatics Research for Development of Knowledge Society Infrastructure
  
  Pages: 19-26
- Related Report
  2006 Annual Research Report
[Journal Article] 複数特徴の重み付き統合による雑音に頑健な発話区間検出2006
- Author(s)
  木田祐介, 河原達也
- Journal Title
  
  電子情報通信学会論文誌 Vol. J89-DII, No. 8
  
  Pages: 1820-1828
- NAID
  110002952512
- Related Report
  2006 Annual Research Report
[Journal Article] Dialogue strategy to clarify user's queries for document retrieval system with speech interface2006
- Author(s)
  T.Misu, T.Kawahara
- Journal Title
  
  Speech Communication Vol. 48, No. 9
  
  Pages: 1137-1150
- Related Report
  2006 Annual Research Report
[Journal Article] Efficient estimation of language model statistics of spontaneous speech via statistical transformation model2006
- Author(s)
  Y.Akita, T.Kawahara
- Journal Title
  
  Proc. IEEE-ICASSP 1
  
  Pages: 1049-1052
- Related Report
  2006 Annual Research Report
[Journal Article] Detection of quotations and inserted clauses and its application to dependency structure analysis in2006
- Author(s)
  R.Hamabe, K.Uchimoto, T.Kawahara, H.Isahara
- Journal Title
  
  Proc. COLING-ACL
  
  Pages: 324-330
- Related Report
  2006 Annual Research Report
[Journal Article] Verification of speech recognition results incorporating in-domain confidence and discourse coherence measures.2006
- Author(s)
  I.R.Lane, T.Kawahara
- Journal Title
  
  IEICE Trans. Vol.E89-D・No.3
  
  Pages: 931-938
- NAID
  110004719366
- Related Report
  2005 Annual Research Report
[Journal Article] Trigger-based language model adaptation for automatic transcription of panel discussions.2006
- Author(s)
  C.Troncoso, T.Kawahara
- Journal Title
  
  IEICE Trans. Vol.E89-D・No.3
  
  Pages: 1024-1031
- NAID
  110004719377
- Related Report
  2005 Annual Research Report
[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005
- Author(s)
  M.Nishida, T.Kawahara
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.13, No.4
  
  Pages: 583-592
- NAID
  120002511373
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005
- Author(s)
  Y.Akita, T.Kawahara
- Journal Title
  
  IEICE Trans. Vol. E88-D, No.3
  
  Pages: 439-445
- NAID
  110003214204
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル.2005
- Author(s)
  秋田祐哉, 河原達也
- Journal Title
  
  電子情報通信学会論文誌 Vol.J88-DII, No.9
  
  Pages: 1780-1789
- NAID
  110003224132
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化.2005
- Author(s)
  下岡和也, 内元清貴, 河原達也, 井佐原均
- Journal Title
  
  自然言語処理 Vol.12, No.3
  
  Pages: 3-17
- NAID
  10016629478
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005
- Author(s)
  M.Nishida, T.Kawahara.
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.13, No.4
  
  Pages: 583-592
- NAID
  120002511373
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005
- Author(s)
  Y.Akita, T.Kawahara.
- Journal Title
  
  IEICE Trans. Vol.E88-D, No.3
  
  Pages: 439-445
- NAID
  110003214204
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Generalized Statistical Modeling of Pronunciation Variations for Spontaneous Speech Recognition.2005
- Author(s)
  Y.Akita, T.Kawahara.
- Journal Title
  
  IEICE Trans. Information and Systems. Vol.J88-DII, No.9
  
  Pages: 1780-1789
- NAID
  110003224132
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Interaction between Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese.2005
- Author(s)
  K.Shitaoka, K.Uchimoto, T.Kawahara, H.Isahara.
- Journal Title
  
  Journal of Natural Language Processing. Vol.12, No.3
  
  Pages: 3-17
- NAID
  10016629478
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Speaker model selection based on Bayesian information criterion applied to unsupervised speaker indexing.2005
- Author(s)
  M.Nishida, T.Kawahara
- Journal Title
  
  IEEE Trans.Speech & Audio Process Vol.13・No.4
  
  Pages: 583-592
- NAID
  120002511373
- Related Report
  2005 Annual Research Report
[Journal Article] User modeling in spoken dialogue systems to generate flexible guidance.2005
- Author(s)
  K.Komatani, S.Ueno, T.Kawahara, H.G.Okuno
- Journal Title
  
  User Modeling and User-Adapted Interaction Vol.15・No.1
  
  Pages: 169-183
- Related Report
  2005 Annual Research Report
[Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル2005
- Author(s)
  秋田祐哉, 河原達也
- Journal Title
  
  電子情報通信学会論文誌 Vol.J88-DII・No.9
  
  Pages: 1780-1789
- NAID
  110003224132
- Related Report
  2005 Annual Research Report
[Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化2005
- Author(s)
  下岡和也, 内元清貴, 河原達也, 井佐原均
- Journal Title
  
  自然言語処理 Vol.12・No.3
  
  Pages: 3-17
- NAID
  10016629478
- Related Report
  2005 Annual Research Report
[Journal Article] Speaker model selection based on Bayesian information criterion applied to unsupervised speaker indexing2005
- Author(s)
  M.Nishida, T.Kawahara
- Journal Title
  
  IEEE Trans. Speech & Audio Processing 13(採録決定)
- NAID
  120002511373
- Related Report
  2004 Annual Research Report
[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions2005
- Author(s)
  Y.Akita, T.Kawahara
- Journal Title
  
  IEICE Trans. E88-D, 3
  
  Pages: 439-445
- NAID
  110003214204
- Related Report
  2004 Annual Research Report
[Journal Article] 連続音声認識ソフトウエアJulius2005
- Author(s)
  河原達也, 李晃伸
- Journal Title
  
  人工知能学会誌 20, 1
  
  Pages: 41-49
- Related Report
  2004 Annual Research Report
[Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004
- Author(s)
  T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 409-419
- NAID
  120002511374
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004
- Author(s)
  H.Nanjo, T.Kawahara
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 391-400
- NAID
  110003171148
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004
- Author(s)
  T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo.
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 409-419
- NAID
  120002511374
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004
- Author(s)
  H.Nanjo, T.Kawahara.
- Journal Title
  
  IEEE Trans. Speech ＆ Audio Process. Vol.12, No.4
  
  Pages: 391-400
- NAID
  110003171148
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers2004
- Author(s)
  T.Kawahara, M.Hasegawa, K.hitaoka, T.Kitade, H.Nanjo
- Journal Title
  
  IEEE Trans. Speech & Audio Processing 12, 4
  
  Pages: 409-419
- NAID
  120002511374
- Related Report
  2004 Annual Research Report
[Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition2004
- Author(s)
  H.Nanjo, T.Kawahara
- Journal Title
  
  IEEE Trans. Speech & Audio Processing 12, 4
  
  Pages: 391-400
- NAID
  110003171148
- Related Report
  2004 Annual Research Report
[Journal Article] 話し言葉による音声対話システム2004
- Author(s)
  河原達也
- Journal Title
  
  情報処理 45, 10
  
  Pages: 1027-1031
- Related Report
  2004 Annual Research Report
[Book] 音声対話システム2006
- Author(s)
  河原達也, 荒木雅弘
- Total Pages
  208
- Publisher
  オーム社
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Annual Research Report 2006 Final Research Report Summary
[Book] Spoken Language Systems2005
- Author(s)
  Seiichi Nakagawa, Michio Okada, Tatsuya Kawahara, editors
- Total Pages
  347
- Publisher
  Ohmsha/IOS Press
- Related Report
  2005 Annual Research Report
[Patent(Industrial Property Rights)] 発話区間検出装置、そのためのコンピュータプログラム及び記録媒体2005
- Inventor(s)
  河原達也, 木田祐介
- Industrial Property Rights Holder
  京都大学
- Industrial Property Number
  2005-197804
- Filing Date
  2005-07-06
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary 2005 Annual Research Report

Automatic Speech Recognition and Understanding of Lectures and Discussions for Effective Multi-media Archiving

Principal Investigator

KAWAHARA Tatsuya Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)

¥45,110,000 (Direct Cost: ¥34,700,000、Indirect Cost: ¥10,410,000)

Report

Research Products

[Journal Article] Out-of-domain utterance detection using classification confidences of multiple topics2007

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Intelligent transcription system based on spontaneous speech processing2007

Author(s)

Journal Title

Related Report

[Journal Article] 複数特徴の重み付き統合による雑音に頑健な発話区間検出2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Dialogue strategy to clarify user's queries for document retrieval system with speech interface2006

Author(s)

Journal Title

Related Report

[Journal Article] Efficient estimation of language model statistics of spontaneous speech via statistical transformation model2006

Author(s)

Journal Title

Related Report

[Journal Article] Detection of quotations and inserted clauses and its application to dependency structure analysis in2006

Author(s)

Journal Title

Related Report

[Journal Article] Verification of speech recognition results incorporating in-domain confidence and discourse coherence measures.2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Trigger-based language model adaptation for automatic transcription of panel discussions.2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル.2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化.2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Generalized Statistical Modeling of Pronunciation Variations for Spontaneous Speech Recognition.2005

Author(s)