• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2006 Fiscal Year Final Research Report Summary

Automatic Speech Recognition and Understanding of Lectures and Discussions for Effective Multi-media Archiving

Research Project

Project/Area Number 16200011
Research Category

Grant-in-Aid for Scientific Research (A)

Allocation TypeSingle-year Grants
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionKyoto University

Principal Investigator

KAWAHARA Tatsuya  Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)

Co-Investigator(Kenkyū-buntansha) MINOH Michihiko  Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (70166099)
FURUI Sadaoki  Tokyo Institute of Technology, Department of Computer Science, Professor, 情報理工学研究科, 教授 (90293076)
AKITA Yuya  Academic Center for Computing and Media Studies, Assistant Professor, 学術情報メディアセンター, 助手 (90402742)
Project Period (FY) 2004 – 2006
KeywordsSpeech processing / Speech recognition / Speech archives / Spontaneous speech / Meta-data extraction / Oral presentation / Lecture / Discussion
Research Abstract

We investigated automatic speech recognition and post-processing of the transcripts of oral presentations at academic meetings, lectures at universities, and discussions on TV programs and parliaments.
In these kinds of spontaneous speech, there is a large variation in pronunciation and a variety of colloquial expressions. Thus, elaborate modeling and robust statistical training are necessary for these issues. Moreover, since there is a wide variety of topics and vocabularies, it is also necessary to adapt the lexicon and language model to each lecture or discussion. For this purpose, we conducted the following studies.
-Generalized statistical modeling of pronunciation variation
-Transformation of language model to spoken-style based on statistical machine translation framework
-Language model adaptation based on PLSA of topics and speakers
-Language model adaptation using slide information
-Language model adaptation based on topic segmentation of meetings
The transcripts (speech recognition results) of spontaneous speech are not appropriate for archiving as they are. It is necessary to clean disfluencies and colloquial expressions and to mark sentence boundaries. Thus, we conducted the following studies.
-Sentence boundary detection using dependency structure analysis
-Clause boundary detection using local syntactic dependency
-Detection of quotations and inserted clauses
-Detection and correction of self-repairs
We also conducted following studies for effective indexing of speech archives.
-Indexing of key sentences in oral presentations
-Alignment of utterances with slides used in lectures

  • Research Products

    (14 results)

All 2006 2005 2004

All Journal Article (12 results) Book (1 results) Patent(Industrial Property Rights) (1 results)

  • [Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

    • Author(s)
      M.Nishida, T.Kawahara
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.13, No.4

      Pages: 583-592

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

    • Author(s)
      Y.Akita, T.Kawahara
    • Journal Title

      IEICE Trans. Vol. E88-D, No.3

      Pages: 439-445

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] 話し言葉音声認識のための汎用的な統計的発音変動モデル.2005

    • Author(s)
      秋田祐哉, 河原達也
    • Journal Title

      電子情報通信学会論文誌 Vol.J88-DII, No.9

      Pages: 1780-1789

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] 日本語話し言葉の係り受け解析と文境界推定の相互作用による高精度化.2005

    • Author(s)
      下岡和也, 内元清貴, 河原達也, 井佐原均
    • Journal Title

      自然言語処理 Vol.12, No.3

      Pages: 3-17

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing.2005

    • Author(s)
      M.Nishida, T.Kawahara.
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.13, No.4

      Pages: 583-592

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Language model adaptation based on PLSA of topics and speakers for automatic transcription of panel discussions.2005

    • Author(s)
      Y.Akita, T.Kawahara.
    • Journal Title

      IEICE Trans. Vol.E88-D, No.3

      Pages: 439-445

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Generalized Statistical Modeling of Pronunciation Variations for Spontaneous Speech Recognition.2005

    • Author(s)
      Y.Akita, T.Kawahara.
    • Journal Title

      IEICE Trans. Information and Systems. Vol.J88-DII, No.9

      Pages: 1780-1789

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Interaction between Dependency Structure Analysis and Sentence Boundary Detection in Spontaneous Japanese.2005

    • Author(s)
      K.Shitaoka, K.Uchimoto, T.Kawahara, H.Isahara.
    • Journal Title

      Journal of Natural Language Processing. Vol.12, No.3

      Pages: 3-17

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004

    • Author(s)
      T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 409-419

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004

    • Author(s)
      H.Nanjo, T.Kawahara
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 391-400

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers.2004

    • Author(s)
      T.Kawahara, M.Hasegawa, K.Shitaoka, T.Kitade, H.Nanjo.
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 409-419

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Language model and speaking rate adaptation for spontaneous presentation speech recognition.2004

    • Author(s)
      H.Nanjo, T.Kawahara.
    • Journal Title

      IEEE Trans. Speech & Audio Process. Vol.12, No.4

      Pages: 391-400

    • Description
      「研究成果報告書概要(欧文)」より
  • [Book] 音声対話システム2006

    • Author(s)
      河原達也, 荒木雅弘
    • Total Pages
      208
    • Publisher
      オーム社
    • Description
      「研究成果報告書概要(和文)」より
  • [Patent(Industrial Property Rights)] 発話区間検出装置、そのためのコンピュータプログラム及び記録媒体2005

    • Inventor(s)
      河原達也, 木田祐介
    • Industrial Property Rights Holder
      京都大学
    • Industrial Property Number
      特願2005-197804
    • Filing Date
      2005-07-06
    • Description
      「研究成果報告書概要(和文)」より

URL: 

Published: 2008-05-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi