• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Automatic Speech Recognition and Understanding of Lectures and Discussions for Effective Multi-media Archiving

Research Project

Project/Area Number 16200011
Research Category

Grant-in-Aid for Scientific Research (A)

Allocation TypeSingle-year Grants
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionKyoto University

Principal Investigator

KAWAHARA Tatsuya  Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)

Co-Investigator(Kenkyū-buntansha) MINOH Michihiko  Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (70166099)
FURUI Sadaoki  Tokyo Institute of Technology, Department of Computer Science, Professor, 情報理工学研究科, 教授 (90293076)
AKITA Yuya  Academic Center for Computing and Media Studies, Assistant Professor, 学術情報メディアセンター, 助手 (90402742)
Project Period (FY) 2004 – 2006
Project Status Completed (Fiscal Year 2006)
Budget Amount *help
¥45,110,000 (Direct Cost: ¥34,700,000、Indirect Cost: ¥10,410,000)
Fiscal Year 2006: ¥12,220,000 (Direct Cost: ¥9,400,000、Indirect Cost: ¥2,820,000)
Fiscal Year 2005: ¥14,170,000 (Direct Cost: ¥10,900,000、Indirect Cost: ¥3,270,000)
Fiscal Year 2004: ¥18,720,000 (Direct Cost: ¥14,400,000、Indirect Cost: ¥4,320,000)
KeywordsSpeech processing / Speech recognition / Speech archives / Spontaneous speech / Meta-data extraction / Oral presentation / Lecture / Discussion
Research Abstract

We investigated automatic speech recognition and post-processing of the transcripts of oral presentations at academic meetings, lectures at universities, and discussions on TV programs and parliaments.
In these kinds of spontaneous speech, there is a large variation in pronunciation and a variety of colloquial expressions. Thus, elaborate modeling and robust statistical training are necessary for these issues. Moreover, since there is a wide variety of topics and vocabularies, it is also necessary to adapt the lexicon and language model to each lecture or discussion. For this purpose, we conducted the following studies.
-Generalized statistical modeling of pronunciation variation
-Transformation of language model to spoken-style based on statistical machine translation framework
-Language model adaptation based on PLSA of topics and speakers
-Language model adaptation using slide information
-Language model adaptation based on topic segmentation of meetings
The transcripts (speech recognition results) of spontaneous speech are not appropriate for archiving as they are. It is necessary to clean disfluencies and colloquial expressions and to mark sentence boundaries. Thus, we conducted the following studies.
-Sentence boundary detection using dependency structure analysis
-Clause boundary detection using local syntactic dependency
-Detection of quotations and inserted clauses
-Detection and correction of self-repairs
We also conducted following studies for effective indexing of speech archives.
-Indexing of key sentences in oral presentations
-Alignment of utterances with slides used in lectures

Report

(4 results)
  • 2006 Annual Research Report   Final Research Report Summary
  • 2005 Annual Research Report
  • 2004 Annual Research Report
  • Research Products

    (33 results)

All 2007 2006 2005 2004

All Journal Article (30 results) Book (2 results) Patent(Industrial Property Rights) (1 results)

  • [Journal Article] Out-of-domain utterance detection using classification confidences of multiple topics2007

    • Author(s)
      I.R.Lane, T.Kawahara, T.Matsui, S.Nakamura
    • Journal Title

      IEEE Trans. Audio, Speech & Language Processing Vol. 15, No. 1

      Pages: 150-161

    • NAID

      120002511372

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Intelligent transcription system based on spontaneous speech processing2007

    • Author(s)
      T.Kawahara
    • Journal Title

      Proc. Int'l Conference on Informatics Research for Development of Knowledge Society Infrastructure

      Pages: 19-26

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 複数特徴の重み付き統合による雑音に頑健な発話区間検出2006

    • Author(s)
      木田祐介, 河原達也
    • Journal Title

      電子情報通信学会論文誌 Vol. J89-DII, No. 8

      Pages: 1820-1828

    • NAID

      110002952512

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Dialogue strategy to clarify user's queries for document retrieval system with speech interface2006

    • Author(s)
      T.Misu, T.Kawahara
    • Journal Title

      Speech Communication Vol. 48, No. 9

      Pages: 1137-1150

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Efficient estimation of language model statistics of spontaneous speech via statistical transformation model2006

    • Author(s)
      Y.Akita, T.Kawahara
    • Journal Title

      Proc. IEEE-ICASSP 1

      Pages: 1049-1052

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Detection of quotations and inserted clauses and its application to dependency structure analysis in2006

    • Author(s)
      R.Hamabe, K.Uchimoto, T.Kawahara, H.Isahara
    • Journal Title

      Proc. COLING-ACL

      Pages: 324-330

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Verification of speech recognition results incorporating in-domain confidence and discourse coherence measures.2006

    • Author(s)
      I.R.Lane, T.Kawahara
    • Journal Title

      IEICE Trans. Vol.E89-D・No.3

      Pages: 931-938

    • NAID

      110004719366

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Trigger-based language model adaptation for automatic transcription of panel discussions.2006

    • Author(s)
      C.Troncoso, T.Kawahara
    • Journal Title

      IEICE Trans. Vol.E89-D・No.3

      Pages: 1024-1031

    • NAID

      110004719377

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

    • Author(s)
      M.Nishida, T.Kawahara
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.13, No.4

      Pages: 583-592

    • NAID

      120002511373

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

    • Author(s)
      Y.Akita, T.Kawahara
    • Journal Title

      IEICE Trans. Vol. E88-D, No.3

      Pages: 439-445

    • NAID

      110003214204

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル.2005

    • Author(s)
      秋田祐哉, 河原達也
    • Journal Title

      電子情報通信学会論文誌 Vol.J88-DII, No.9

      Pages: 1780-1789

    • NAID

      110003224132

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化.2005

    • Author(s)
      下岡和也, 内元清貴, 河原達也, 井佐原均
    • Journal Title

      自然言語処理 Vol.12, No.3

      Pages: 3-17

    • NAID

      10016629478

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

    • Author(s)
      M.Nishida, T.Kawahara.
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.13, No.4

      Pages: 583-592

    • NAID

      120002511373

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

    • Author(s)
      Y.Akita, T.Kawahara.
    • Journal Title

      IEICE Trans. Vol.E88-D, No.3

      Pages: 439-445

    • NAID

      110003214204

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Generalized Statistical Modeling of Pronunciation Variations for Spontaneous Speech Recognition.2005

    • Author(s)
      Y.Akita, T.Kawahara.
    • Journal Title

      IEICE Trans. Information and Systems. Vol.J88-DII, No.9

      Pages: 1780-1789

    • NAID

      110003224132

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Interaction between Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese.2005

    • Author(s)
      K.Shitaoka, K.Uchimoto, T.Kawahara, H.Isahara.
    • Journal Title

      Journal of Natural Language Processing. Vol.12, No.3

      Pages: 3-17

    • NAID

      10016629478

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Speaker model selection based on Bayesian information criterion applied to unsupervised speaker indexing.2005

    • Author(s)
      M.Nishida, T.Kawahara
    • Journal Title

      IEEE Trans.Speech & Audio Process Vol.13・No.4

      Pages: 583-592

    • NAID

      120002511373

    • Related Report
      2005 Annual Research Report
  • [Journal Article] User modeling in spoken dialogue systems to generate flexible guidance.2005

    • Author(s)
      K.Komatani, S.Ueno, T.Kawahara, H.G.Okuno
    • Journal Title

      User Modeling and User-Adapted Interaction Vol.15・No.1

      Pages: 169-183

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル2005

    • Author(s)
      秋田祐哉, 河原達也
    • Journal Title

      電子情報通信学会論文誌 Vol.J88-DII・No.9

      Pages: 1780-1789

    • NAID

      110003224132

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化2005

    • Author(s)
      下岡和也, 内元清貴, 河原達也, 井佐原均
    • Journal Title

      自然言語処理 Vol.12・No.3

      Pages: 3-17

    • NAID

      10016629478

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Speaker model selection based on Bayesian information criterion applied to unsupervised speaker indexing2005

    • Author(s)
      M.Nishida, T.Kawahara
    • Journal Title

      IEEE Trans. Speech & Audio Processing 13(採録決定)

    • NAID

      120002511373

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions2005

    • Author(s)
      Y.Akita, T.Kawahara
    • Journal Title

      IEICE Trans. E88-D, 3

      Pages: 439-445

    • NAID

      110003214204

    • Related Report
      2004 Annual Research Report
  • [Journal Article] 連続音声認識ソフトウエアJulius2005

    • Author(s)
      河原達也, 李晃伸
    • Journal Title

      人工知能学会誌 20, 1

      Pages: 41-49

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004

    • Author(s)
      T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 409-419

    • NAID

      120002511374

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004

    • Author(s)
      H.Nanjo, T.Kawahara
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 391-400

    • NAID

      110003171148

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004

    • Author(s)
      T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo.
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 409-419

    • NAID

      120002511374

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004

    • Author(s)
      H.Nanjo, T.Kawahara.
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 391-400

    • NAID

      110003171148

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers2004

    • Author(s)
      T.Kawahara, M.Hasegawa, K.hitaoka, T.Kitade, H.Nanjo
    • Journal Title

      IEEE Trans. Speech & Audio Processing 12, 4

      Pages: 409-419

    • NAID

      120002511374

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition2004

    • Author(s)
      H.Nanjo, T.Kawahara
    • Journal Title

      IEEE Trans. Speech & Audio Processing 12, 4

      Pages: 391-400

    • NAID

      110003171148

    • Related Report
      2004 Annual Research Report
  • [Journal Article] 話し言葉による音声対話システム2004

    • Author(s)
      河原達也
    • Journal Title

      情報処理 45, 10

      Pages: 1027-1031

    • Related Report
      2004 Annual Research Report
  • [Book] 音声対話システム2006

    • Author(s)
      河原達也, 荒木雅弘
    • Total Pages
      208
    • Publisher
      オーム社
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Annual Research Report 2006 Final Research Report Summary
  • [Book] Spoken Language Systems2005

    • Author(s)
      Seiichi Nakagawa, Michio Okada, Tatsuya Kawahara, editors
    • Total Pages
      347
    • Publisher
      Ohmsha/IOS Press
    • Related Report
      2005 Annual Research Report
  • [Patent(Industrial Property Rights)] 発話区間検出装置、そのためのコンピュータプログラム及び記録媒体2005

    • Inventor(s)
      河原達也, 木田祐介
    • Industrial Property Rights Holder
      京都大学
    • Industrial Property Number
      2005-197804
    • Filing Date
      2005-07-06
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary 2005 Annual Research Report

URL: 

Published: 2004-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi