Structure Extraction and Visualization of Spontaneous Speech Communication

Research Project

Project/Area Number	19300061
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Kyoto University
Principal Investigator	KAWAHARA Tatsuya Kyoto University, 学術情報メディアセンター, 教授 (00234104)
Co-Investigator(Kenkyū-buntansha)	NAKAMURA Yuichi 京都大学, 学術情報メディアセンター, 教授 (40227947) AKITA Yuya 京都大学, 学術情報メディアセンター, 助教 (90402742) UCHIMOTO Kiyotaka 情報通信研究機構, 知識創成コミュニケーション研究センター, 主任研究員 (60358885) MORI Shinsuke 京都大学, 学術情報メディアセンター, 准教授 (90456773)
Project Period (FY)	2007 – 2009
Project Status	Completed (Fiscal Year 2009)
Budget Amount *help	¥17,940,000 (Direct Cost: ¥13,800,000、Indirect Cost: ¥4,140,000) Fiscal Year 2009: ¥5,460,000 (Direct Cost: ¥4,200,000、Indirect Cost: ¥1,260,000) Fiscal Year 2008: ¥5,460,000 (Direct Cost: ¥4,200,000、Indirect Cost: ¥1,260,000) Fiscal Year 2007: ¥7,020,000 (Direct Cost: ¥5,400,000、Indirect Cost: ¥1,620,000)
Keywords	音声言語処理 / 話し言葉 / 音声認識 / 言語解析 / メタデータ付与 / メディア検索 / 映像解析
Research Abstract	For effective exploitation of large-scale audio archives such as lectures, conferences and meetings, we investigate automatic speech recognition of these kinds of spontaneous speech communication, as well as extraction of linguistic structures and effective presentation. Automatic transcription systems for academic lectures, classroom lectures and parliamentary meetings are implemented.

Report

(4 results)

2009 Annual Research Report Final Research Report ( PDF )
2008 Annual Research Report
2007 Annual Research Report

Research Products
(60 results)

All 2010 2009 2008 2007

All Journal Article (23 results) (of which Peer Reviewed: 9 results) Presentation (34 results) Book (2 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] Online unsupervised classification with model comparison in the Variational Bayes framework for voice activity detection.2010
- Author(s)
  D. Cournapeau, S. Watanabe, A. Nakamura, T. Kawahara
- Journal Title
  
  IEEE J. Selected Topics in Signal Processing (accepted for publication)
- NAID
  120002598753
- Related Report
  2009 Final Research Report
[Journal Article] Gaussian mixture optimization based on efficient cross-validation.2010
- Author(s)
  T. Shinozaki, S. Furui, T. Kawahara
- Journal Title
  
  IEEE J. Selected Topics in Signal Processing (accepted for publication)
- NAID
  110006381954
- Related Report
  2009 Final Research Report
[Journal Article] Statistical transformation of language and pronunciation models for spontaneous speech recognition.2010
- Author(s)
  Y. Akita, T. Kawahara
- Journal Title
  
  IEEE Trans. Audio, Speech & Language Process. (accepted for publication)
- NAID
  120002511319
- Related Report
  2009 Final Research Report
[Journal Article] Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude estimation.2010
- Author(s)
  K. Ishizuka, S. Araki, T. Kawahara
- Journal Title
  
  IEEE Trans. Audio, Speech & Language Process. Vol.18(accepted for publication)
- Related Report
  2009 Final Research Report
[Journal Article] Bayes risk-based dialogue management for document retrieval system with speech interface.2010
- Author(s)
  T. Misu, T. Kawahara
- Journal Title
  
  Speech Communication Vol.52,No.1
  
  Pages: 61-71
- Related Report
  2009 Final Research Report
[Journal Article] Online unsupervised classification with model comparison in the Variational Bayes framework for voice activity detection2010
- Author(s)
  D.Cournapeau, S.Watanabe, A.Nakamura, T.Kawahara
- Journal Title
  
  IEEE J.Selected Topics in Signal Processing (掲載決定)
- NAID
  120002598753
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] Statistical transformation of language and pronunciation models for spontaneous speech recognition2010
- Author(s)
  Y.Akita, T.Kawahara
- Journal Title
  
  IEEE Trans.Audio, Speech & Language Processing Vol. 18(掲載決定)
- NAID
  120002511319
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude estimation2010
- Author(s)
  K.Ishizuka, S.Araki, T.Kawahara
- Journal Title
  
  IEEE Trans.Audio, Speech & Language Processing Vol. 18(掲載決定)
- Related Report
  2009 Annual Research Report
- Peer Reviewed
[Journal Article] Effective prediction of errors by non-native speakers using decision tree for speech recognition-based CALL system.2009
- Author(s)
  H. Wang, T. Kawahara
- Journal Title
  
  IEICE Trans. Vol.E92-D,No.12
  
  Pages: 2462-2468
- NAID
  10026812661
- Related Report
  2009 Final Research Report
[Journal Article] Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition.2009
- Author(s)
  H. Wang, C.J. Waple, T. Kawahara
- Journal Title
  
  Speech Communication Vol.51,No.10
  
  Pages: 995-1005
- Related Report
  2009 Final Research Report
[Journal Article] 局所的な係り受けの情報を用いた話し言葉の節・文境界の推定.2009
- Author(s)
  西光雅弘, 秋田祐哉, 高梨克也, 尾嶋憲治, 河原達也
- Journal Title
  
  情報処理学会論文誌 Vol.50,No.2
  
  Pages: 544-552
- NAID
  110007970350
- Related Report
  2009 Final Research Report
[Journal Article] スライド情報を用いた言語モデル適応による講義音声認識2009
- Author(s)
  河原達也, 根本雄介, 勝丸徳浩, 秋田祐哉
- Journal Title
  
  情報処理学会論文誌 Vol.50,No.2
  
  Pages: 469-476
- NAID
  110007970343
- Related Report
  2009 Final Research Report
[Journal Article] 話し言葉における引用節・挿入節の自動認定および係り受け解析への応用2009
- Author(s)
  浜辺良二, 内元清貴, 河原達也, 井佐原均
- Journal Title
  
  自然言語処理 Vol.16,No.1
  
  Pages: 3-23
- NAID
  10024758516
- Related Report
  2009 Final Research Report
[Journal Article] 局所的な係り受けの情報を用いた話し言葉の節・文境界の推定.2009
- Author(s)
  西光雅弘, 秋田祐哉, 高梨克也, 尾嶋憲治, 河原達也.
- Journal Title
  
  情報処理学会論文誌 Vol. 50, No. 2
  
  Pages: 544-552
- NAID
  110007970350
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] スライド情報を用いた言語モデル適応による講義音声認識.2009
- Author(s)
  河原達也, 根本雄介, 勝丸徳浩, 秋田祐哉.
- Journal Title
  
  情報処理学会論文誌 Vol. 50, No . 2
  
  Pages: 469-476
- NAID
  110007970343
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] 話し言葉における引用節・挿入節の自動認定および係り受け解析への応用.2009
- Author(s)
  浜辺良二, 内元清貴, 河原達也, 井佐原均.
- Journal Title
  
  自然言語処理 Vol. 16, No. 1
  
  Pages: 3-23
- NAID
  10024758516
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Voice activity detection based on high order statistics and online EM algorithm.2008
- Author(s)
  D. Cournapeau, T. Kawahara
- Journal Title
  
  IEICE Trans. Vol.E91-D,No.12
  
  Pages: 2854-2861
- NAID
  10026806855
- Related Report
  2009 Final Research Report
[Journal Article] 音声理解を指向したベイズリスク最小化枠組みに基づく音声認識2008
- Author(s)
  南條浩輝, 河原達也, 七里崇
- Journal Title
  
  電子情報通信学会論文誌 Vol.J91-D,No.5
  
  Pages: 1314-1324
- NAID
  110007380122
- Related Report
  2009 Final Research Report
[Journal Article] 音声理解を指向したベイズリスク最小化枠組みに基づく音声認識.2008
- Author(s)
  南條浩輝, 河原達也, 七里崇.
- Journal Title
  
  電子情報通信学会論文誌 J91-D
  
  Pages: 1314-1324
- NAID
  110007380122
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] 質問応答・情報推薦機能を備えた音声による情報案内システム2007
- Author(s)
  翠輝久, 河原達也, 正司哲朗, 美濃導彦
- Journal Title
  
  情報処理学会論文誌 Vol.48,No.12
  
  Pages: 3602-3611
- NAID
  110006531940
- Related Report
  2009 Final Research Report
[Journal Article] ドメインとスタイルを考慮したwebテキストの選択による音声対話システム用言語モデルの構築.2007
- Author(s)
  翠輝久, 河原達也
- Journal Title
  
  電子情報通信学会論文誌 Vol.J90-D,No.11
  
  Pages: 3024-3032
- NAID
  110007380619
- Related Report
  2009 Final Research Report
[Journal Article] 質問応答・情報推薦機能を備えた音声による情報案内システム.2007
- Author(s)
  翠輝久, 河原達也, 正司哲朗, 美濃導彦.
- Journal Title
  
  情報処理学会論文誌 48
  
  Pages: 3602-3611
- NAID
  110006531940
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] ドメインとスタイルを考慮したwebテキストの選択による音声対話システム用言語モデルの構築.2007
- Author(s)
  翠輝久, 河原達也.
- Journal Title
  
  電子情報通信学会論文誌 J90-D
  
  Pages: 3024-3032
- NAID
  110007380619
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Presentation] Improved statistical models for SMT-based speaking style transformation.2010
- Author(s)
  G. Neubig, Y. Akita, S. Mori, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ダラス
- Related Report
  2009 Final Research Report
[Presentation] Optimizing spectral subtraction and Wiener filtering for robust speech recognition in reverberant and noisy conditions.2010
- Author(s)
  R. Gomez, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ダラス
- Related Report
  2009 Final Research Report
[Presentation] Using online model comparison in the Variational Bayes framework for online unsupervised voice activity detection.2010
- Author(s)
  D. Cournapeau, S. Watanabe, A. Nakamura, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ダラス
- Related Report
  2009 Final Research Report
[Presentation] Transcription system using automatic speech recognition for the Japanese parliament (Diet)2009
- Author(s)
  T.Kawahara
- Organizer
  INTERSTENO
- Place of Presentation
  中国・北京(招待講演)
- Year and Date
  2009-08-19
- Related Report
  2009 Annual Research Report
[Presentation] New perspectives on spoken language understanding: Does machine need to fully understand speech?2009
- Author(s)
  T. Kawahara
- Organizer
  In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding
- Place of Presentation
  イタリア・メラノ
- Related Report
  2009 Final Research Report
[Presentation] Tight integration of dereverberation and automatic speech recognition.2009
- Author(s)
  R. Gomez, T. Kawahara
- Organizer
  In Proc. APSIPA ASC
- Place of Presentation
  札幌
- Related Report
  2009 Final Research Report
[Presentation] Recent development of open-source speech recognition engine Julius.2009
- Author(s)
  A. Lee, T. Kawahara
- Organizer
  In Proc. APSIPA ASC
- Place of Presentation
  札幌
- Related Report
  2009 Final Research Report
[Presentation] A WFST-based log-linear framework for speaking-style transformation.2009
- Author(s)
  G. Neubig, S. Mori, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  英国・ブライトン
- Related Report
  2009 Final Research Report
[Presentation] Optimization of dereverberation parameters based on likelihood of speech recognizer.2009
- Author(s)
  R. Gomez, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  英国・ブライトン
- Related Report
  2009 Final Research Report
[Presentation] Acoustic event detection for spotting hot spots in podcasts.2009
- Author(s)
  K. Sumi, T. Kawahara, J. Ogata, M. Goto.
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  英国・ブライトン
- Related Report
  2009 Final Research Report
[Presentation] Automatic transcription system for meetings of the Japanese.2009
- Author(s)
  Y. Akita, M. Mimura, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  英国・ブライトン
- Related Report
  2009 Final Research Report
[Presentation] Language model transformation applied to lightly supervised training of acoustic model for congress meetings.2009
- Author(s)
  T. Kawahara, M. Mimura, Y. Akita
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  台北
- Related Report
  2009 Final Research Report
[Presentation] Automatic lecture transcription by exploiting presentation slide information for language model adaptation.2008
- Author(s)
  T. Kawahara, Y. Nemoto, Y. Akita.
- Organizer
  IEEE-ICASSP
- Place of Presentation
  アメリカ合衆国(ラスベガス)
- Year and Date
  2008-04-01
- Related Report
  2008 Annual Research Report
[Presentation] Extracting word-pronunciation pairs from comparable set of text and speech.2008
- Author(s)
  T. Sasada, S. Mori, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  豪州・ブリスベーン
- Related Report
  2009 Final Research Report
[Presentation] A Japanese CALL system based on dynamic question generation and error prediction for ASR.2008
- Author(s)
  H. Wang, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  豪州・ブリスベーン
- Related Report
  2009 Final Research Report
[Presentation] Detection of feeling through back-channels in spoken dialogue.2008
- Author(s)
  T. Kawahara, M. Toyokura, T. Misu, C. Hori
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  豪州・ブリスベーン
- Related Report
  2009 Final Research Report
[Presentation] Multi-modal recording, analysis and indexing of poster sessions.2008
- Author(s)
  T. Kawahara, H. Setoguchi, K. Takanashi, K. Ishizuka, S. Araki.
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  豪州・ブリスベーン
- Related Report
  2009 Final Research Report
[Presentation] Statistical speech activity detection based on spatial power distribution for analyses of poster presentations.2008
- Author(s)
  K. Ishizuka, S. Araki, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  豪州・ブリスベーン
- Related Report
  2009 Final Research Report
[Presentation] Bayes risk-based dialogue management for document retrieval system with speech interface.2008
- Author(s)
  T. Misu, T. Kawahara
- Organizer
  In Proc. COLING, Vol. Posters & Demo.
- Place of Presentation
  英国・マンチェスター
- Related Report
  2009 Final Research Report
[Presentation] Effective error prediction using decision tree for ASR grammar network in CALL system.2008
- Author(s)
  H. Wang, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ラスベガス
- Related Report
  2009 Final Research Report
[Presentation] Automatic lecture transcription by exploiting presentation slide information for language model adaptation.2008
- Author(s)
  T. Kawahara, Y. Nemoto, Y. Akita
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ラスベガス
- Related Report
  2009 Final Research Report
[Presentation] Using Variational Bayes Free Energy for unsupervised voice activity detection.2008
- Author(s)
  D. Cournapeau, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ラスベガス
- Related Report
  2009 Final Research Report
[Presentation] GMM and HMM training by aggregated EM algorithm with increased ensemble sizes for robust parameter estimation.2008
- Author(s)
  T. Shinozaki, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ラスベガス
- Related Report
  2009 Final Research Report
[Presentation] Speech-based interactive information guidance systemusing question-answering technique.2007
- Author(s)
  T. Misu and T. Kawahara.
- Organizer
  IEEE-ICASSP
- Place of Presentation
  アメリカ合衆国
- Year and Date
  2007-04-18
- Related Report
  2007 Annual Research Report
[Presentation] HMM training based on CV-EM and CV Gaussian mixture optimization.2007
- Author(s)
  T. Shinozaki, T. Kawahara
- Organizer
  In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding
- Place of Presentation
  京都
- Related Report
  2009 Final Research Report
[Presentation] Evaluation of real-time voice activity detection based on high order statistics.2007
- Author(s)
  D. Cournapeau, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  ベルギー・ブリュッセル
- Related Report
  2009 Final Research Report
[Presentation] Bayes risk-based optimization of dialogue management for document retrieval system with speech interface.2007
- Author(s)
  T. Misu, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  ベルギー・ブリュッセル
- Related Report
  2009 Final Research Report
[Presentation] Evaluating and optimizing Japanese tutor system featuring dynamic question generation and interactive guidance.2007
- Author(s)
  C. Waple, H. Wang, T. Kawahara Y. Tsubota, M. Dantsuji
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  ベルギー・ブリュッセル
- Related Report
  2009 Final Research Report
[Presentation] Gaussian mixture optimization for HMM based on efficient cross-validation.2007
- Author(s)
  T. Shinozaki, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  ベルギー・ブリュッセル
- Related Report
  2009 Final Research Report
[Presentation] PLSA-based topic detection in meetings for adaptation of lexicon and language model.2007
- Author(s)
  Y. Akita, Y. Nemoto, T. Kawahara
- Organizer
  In Proc. INTERSPEECH
- Place of Presentation
  ベルギーブリュッセル
- Related Report
  2009 Final Research Report
[Presentation] An interactive framework for document retrieval and presentation with question-answering function in restricted domain.2007
- Author(s)
  T. Misu, T. Kawahara
- Organizer
  In Proc. IEA/AIE
- Place of Presentation
  京都
- Related Report
  2009 Final Research Report
[Presentation] Speech-based interactive information guidance system using question-answering technique.2007
- Author(s)
  T. Misu, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ホノルル
- Related Report
  2009 Final Research Report
[Presentation] Automatic detection of sentence and clause units using local syntactic dependency.2007
- Author(s)
  T. Kawahara, M. Saikou, K. Takanashi
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ホノルル
- Related Report
  2009 Final Research Report
[Presentation] Topic-independent speaking-style transformation of language model for spontaneous speech recognition.2007
- Author(s)
  Y. Akita, T. Kawahara
- Organizer
  In Proc. IEEE-ICASSP
- Place of Presentation
  米国・ホノルル
- Related Report
  2009 Final Research Report
[Book]2008
- Author(s)
  S. Furui, T. Kawahara
- Publisher
  Springer
- Related Report
  2009 Final Research Report
[Book] Springer Handbook of Speech Processing2008
- Author(s)
  Sadaoki Furui and Tatsuya Kawahara
- Publisher
  Springer
- Related Report
  2007 Annual Research Report
[Patent(Industrial Property Rights)] 音響モデル学習装置、音声認識装置、及び音響モデル学習のためのコンピュータプログラム2009
- Inventor(s)
  三村正人, 河原達也
- Industrial Property Rights Holder
  京都大学
- Industrial Property Number
  2009-094212
- Filing Date
  2009-04-08
- Related Report
  2009 Annual Research Report 2009 Final Research Report

Structure Extraction and Visualization of Spontaneous Speech Communication

Principal Investigator

KAWAHARA Tatsuya Kyoto University, 学術情報メディアセンター, 教授 (00234104)

¥17,940,000 (Direct Cost: ¥13,800,000、Indirect Cost: ¥4,140,000)

Report

Research Products

[Journal Article] Online unsupervised classification with model comparison in the Variational Bayes framework for voice activity detection.2010

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Gaussian mixture optimization based on efficient cross-validation.2010

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Statistical transformation of language and pronunciation models for spontaneous speech recognition.2010

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude estimation.2010

Author(s)

Journal Title

Related Report

[Journal Article] Bayes risk-based dialogue management for document retrieval system with speech interface.2010

Author(s)

Journal Title

Related Report

[Journal Article] Online unsupervised classification with model comparison in the Variational Bayes framework for voice activity detection2010

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Statistical transformation of language and pronunciation models for spontaneous speech recognition2010

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude estimation2010

Author(s)

Journal Title

Related Report

[Journal Article] Effective prediction of errors by non-native speakers using decision tree for speech recognition-based CALL system.2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition.2009

Author(s)

Journal Title

Related Report

[Journal Article] 局所的な係り受けの情報を用いた話し言葉の節・文境界の推定.2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] スライド情報を用いた言語モデル適応による講義音声認識2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 話し言葉における引用節・挿入節の自動認定および係り受け解析への応用2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 局所的な係り受けの情報を用いた話し言葉の節・文境界の推定.2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] スライド情報を用いた言語モデル適応による講義音声認識.2009

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 話し言葉における引用節・挿入節の自動認定および係り受け解析への応用.2009

Author(s)

Journal Title