2010 Fiscal Year Final Research Report

A study of multimodal recognition for human communication search

Research Project

Project/Area Number	20300063
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Tokyo Institute of Technology
Principal Investigator	SHINODA Koichi Tokyo Institute of Technology, 大学院・情報理工学研究科, 准教授 (10343097)
Co-Investigator(Kenkyū-buntansha)	FURUI Sadaoki 東京工業大学, 大学院・情報理工学研究科, 教授 (90293076)
Project Period (FY)	2008 – 2010
Keywords	音声認識 / 動画像認識 / マルチモーダル認識 / ヒューマンコミュニケーション理解 / 情報検索
Research Abstract	We developed multimodal pattern recognition techniques for human communication using speech and video. We proposed a statistical technique using Gaussian mixture models and support vector machines for event extraction. We participated in TRECVID2010 workshop, where our method achieved the 4-th performance among 40 participants from all over the world. We also developed new methods for active learning for speech modeling and adaptation, noise robust speech recognition, signal processing for meeting speech recognition, multimodal pattern recognition, speaker/gesture recognition, speech style analysis and video summarization.

Research Products
(38 results)

All 2011 2010 2009 2008 Other

All Journal Article (4 results) (of which Peer Reviewed: 4 results) Presentation (33 results) Remarks (1 results)

[Journal Article] Acoustic Model Adaptation for Speech Recognition2010
- Author(s)
  篠田浩一
- Journal Title
  
  IEICE Transactions on Tnformation and Systems Vol.E93-D, No.9
  
  Pages: 2348-2362
- Peer Reviewed
[Journal Article] 大規模映像資源のためのマルチモーダル高次特徴検出2010
- Author(s)
  井上中順、斉藤辰彦、篠田浩一、古井貞煕
- Journal Title
  
  電子情報通信学会論文誌 Vol.J93-D, No.12
  
  Pages: 2633-2644
- Peer Reviewed
[Journal Article] Semi-synchronous speech and pen input for mobile user interfaces2010
- Author(s)
  Koichi Shinoda, Yasushi Watanabe, Kenji Iwata, Yuan Liang, Ryuta Nakagawa, Sadaoki Furui
- Journal Title
  
  Speech communication Vol.53
  
  Pages: 283-291
- Peer Reviewed
[Journal Article] Automatic recognition of Indonesian declarative questions and statements using polynomial coefficients of the pitch contours2009
- Author(s)
  Nazrul Effendy, Koichi Shinoda, Sadaoki Furui, Somchai Jitapunkul
- Journal Title
  
  2009 The Acoustical Society of Japan、Accoust.Sci.& Tech. No.30
  
  Pages: 249-256
- Peer Reviewed
[Presentation] 音響モデル学習のための相対エントロピーを用いた学習文選択手法2011
- Author(s)
  村上博子、篠田浩一、古井貞煕
- Organizer
  日本音響学会2011年春季講演発表会
- Place of Presentation
  東京
- Year and Date
  2011-03-09
[Presentation] Voting Approach in SMAP Adaptation for Speaker Verification2011
- Author(s)
  Sangeeta Biswas, Marc Ferras, Koichi Shinoda、Sadaoki Furui
- Organizer
  日本音響学会2011年春季研究発表会
- Place of Presentation
  東京
- Year and Date
  2011-03-09
[Presentation] 雑音下音声におけるスペクトル縮小の分析とその対雑音音声認識への利用2011
- Author(s)
  別府真由美、篠田浩一、古井貞煕
- Organizer
  電子情報通信学会SP研究会
- Place of Presentation
  東京
- Year and Date
  2011-03-04
[Presentation] マルチモーダル・マルチフレームな手法を用いたTTECVIDセマンティックインデクシング2011
- Author(s)
  井上中順、上嶋勇祐、篠田浩一
- Organizer
  電子情報通信学会PRMU研究会
- Place of Presentation
  さいたま市
- Year and Date
  2011-02-17
[Presentation] 音響モデル学習のための相対エントロピーを用いた学習文選択2011
- Author(s)
  村上博子、篠田浩一、古井貞煕
- Organizer
  情報処理学会音声言語情報処理学会
- Place of Presentation
  福山市
- Year and Date
  2011-02-04
[Presentation] Inter-speaker weighted MAP adaptation for GNM-supervector speaker recognition2010
- Author(s)
  Marc Ferras、Koichi Shinoda、Sadaoki Furui
- Organizer
  情報処理学会音声言語情報処理学会
- Place of Presentation
  東京
- Year and Date
  2010-12-20
[Presentation] Optimal use of trees in structural MAP adaptation for speaker verification2010
- Author(s)
  Sangeeta Biswas、Marc Ferras、Koichi Shinoda、Sadaoki Furui
- Organizer
  報処理学会音声言語情報処理学会
- Place of Presentation
  東京
- Year and Date
  2010-12-20
[Presentation]2010
- Author(s)
  Nakamasa Inoue, Toshiya Wada、Yusuke Kamishima、Koichi Shinoda、Ilseo Kim、Byungki Byun, Chin-Hui Lee
- Organizer
  TT+GT at TRECVID 2010 Workshop, TRECVTD 2010 workshop
- Place of Presentation
  Gaithersburg
- Year and Date
  2010-11-15
[Presentation] Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs2010
- Author(s)
  Muhammad Rasyid Aqmar、Koichi Shinoda、Sadaoki Furui
- Organizer
  電子情報通信学会PRUM研究会
- Place of Presentation
  千葉市
- Year and Date
  2010-10-08
[Presentation] 会議音声認識のためのスペクトル減算に基づく音源分離2010
- Author(s)
  那須悠、篠田浩一、古井貞煕
- Organizer
  日本音響学会2010年秋季研究発表会
- Place of Presentation
  大阪
- Year and Date
  2010-09-14
[Presentation] SIFT混合ガウス分布を用いた一般物体認識のためのマルチカーネル学習2010
- Author(s)
  井上中順、上嶋勇祐、篠田浩一、古井貞煕
- Organizer
  電子情報通信学会PRMU研究会
- Place of Presentation
  福岡市
- Year and Date
  2010-09-05
[Presentation] Robust Gait Recognition against Speed Variation2010
- Author(s)
  Muhammad Rasyid Agmar, Koichi Shinoda, Sadaoki Furui
- Organizer
  ICPR2010
- Place of Presentation
  Istanbul
- Year and Date
  2010-08-23
[Presentation] High-Level Feature Extraction Using SIFT GMMs and Audio Models2010
- Author(s)
  井上中順, 斉藤辰彦、篠田浩一, 古井貞煕
- Organizer
  ICPR2010
- Place of Presentation
  Istanbul
- Year and Date
  2010-08-23
[Presentation] ToFカメラによる3D手話認識2010
- Author(s)
  佐藤新、篠田浩一、古井貞煕
- Organizer
  画像の認識・理解シンポジウム
- Place of Presentation
  釧路
- Year and Date
  2010-07-27
[Presentation] NTST SRE 2010 : Tokyo Tech Speaker Recognition2010
- Author(s)
  Marc Ferras、Sangeeta Biswas、Koichi Shinoda、Sadaoki Furui
- Organizer
  NTST 2010 Speaker recognition evaluation workshop
- Place of Presentation
  Brno
- Year and Date
  2010-06-24
[Presentation] 会議音声認識のためのスペクトル減算に基づくオンライン音源分離2010
- Author(s)
  那須悠、篠田浩一、古井貞煕
- Organizer
  電子情報通信学会SP研究会
- Place of Presentation
  神戸市
- Year and Date
  2010-05-26
[Presentation] Speech Modeling Based on Committee-Based Active Learning2010
- Author(s)
  濱中悠三、篠田浩一、古井貞煕、江森正、越仲孝文
- Organizer
  ICASSP2010
- Place of Presentation
  Dallas, U.S.A
- Year and Date
  2010-03-14
[Presentation] 音響特徴を用いた映像からのイベント検出の研究2010
- Author(s)
  斉藤辰彦、井上中順、篠田浩一、古井貞煕
- Organizer
  日本音響学会2010年春季研究発表会
- Place of Presentation
  東京
- Year and Date
  2010-03-08
[Presentation] 音声認識のための複数の認識器を利用した能動学習2009
- Author(s)
  濱中悠三、江森正、越中孝文、篠田浩一、古井貞煕
- Organizer
  情報処理学会音声言語情報処理学会
- Place of Presentation
  東京
- Year and Date
  2009-12-21
[Presentation] SIFT混合ガウス分布と音響特徴を用いた映像からの高次特徴検出2009
- Author(s)
  井上中順、斉藤辰彦、篠田浩一、古井貞煕
- Organizer
  電子情報通信学会PRMU研究会
- Place of Presentation
  金沢市
- Year and Date
  2009-11-26
[Presentation] Chin-Hui LeeiTITGT at TRECVID 2009 Workshop2009
- Author(s)
  Nakamasa Inoue、Shanshan Han、Tatsuhiko Saito、Koichi Shinoda、Ilseo Kim
- Organizer
  TRECVID Workshop (TRECVID 2009)
- Place of Presentation
  Gai thersburg
- Year and Date
  2009-11-16
[Presentation] Noise robust speech recognition using spectral subtraction and FO information extracted by Hough transform2009
- Author(s)
  安井英己、篠田浩一、古井貞煕、岩野公司
- Organizer
  Asia-Pacific Signal and Information Processing Association 2009 Annual Summit and Conference
- Place of Presentation
  Sapporo、Japan
- Year and Date
  2009-10-05
[Presentation] 音声認識のためのコミッティを用いた能動学習2009
- Author(s)
  濱中悠三、江森正、越仲孝文、篠田浩一、古井貞煕
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  郡山市
- Year and Date
  2009-09-15
[Presentation] Speaker Adaptation Based on Two-Step Active Learning2009
- Author(s)
  村上博子、篠田浩一、古井貞煕
- Organizer
  INTERSPEECH 2009 BRIGHTON
- Place of Presentation
  Brighton UK
- Year and Date
  2009-09-06
[Presentation] ハブ変換による基本周波数情報を用いた耐雑音音声認識の高性能化の検討2009
- Author(s)
  安井英己、篠田浩一、古井貞煕、岩野公司
- Organizer
  日本音響学会2009年春季研究発表会
- Place of Presentation
  東京
- Year and Date
  2009-03-17
[Presentation] 能動的な適応文選択に基づく話者適応化2009
- Author(s)
  村上博子、篠田浩一、古井貞煕
- Organizer
  日本音響学会2009年春季研究発表会
- Place of Presentation
  東京
- Year and Date
  2009-03-17
[Presentation] 統計的モデル選択によるシーン数の自動推定を用いた動画要約2009
- Author(s)
  山崎航史、篠田浩一、古井貞煕
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  東京
- Year and Date
  2009-02-19
[Presentation] Gait Recognition Using CHLAC Features and Hidden Markov Models2009
- Author(s)
  M.-R.Aqmar、K.Shinoda, S.Furui
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  東京
- Year and Date
  2009-02-19
[Presentation] 耐雑音音声認識のためハブ変換による基本周波数情報抽出の高速化2009
- Author(s)
  安井英己、篠田浩一、古井貞煕、岩野公司
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  奈良
- Year and Date
  2009-01-12
[Presentation] Tokyo Tech at TRBCVID 20082008
- Author(s)
  S.Hao, Y.Yoshizawa, K.Yamasaki, K.Shinoda, S.Furui
- Organizer
  TRECVID 2008 workshop
- Place of Presentation
  Washington D.C., USA
- Year and Date
  2008-11-17
[Presentation] Automatically Estimating Number of Scenes for Rushes Summarization2008
- Author(s)
  山崎航史、篠田浩一, 古井貞煕
- Organizer
  In Proceedings of the TRECVID BBC Rushes Summarization Workshop (TVS 2008)
- Place of Presentation
  ACM Multimedia, New York, USA
- Year and Date
  2008-10-31
[Presentation] Time-lag Adaptation for Semi-synchronous Speech and Pen Input2008
- Author(s)
  Yasushi Watanabe、Koichi Shinoda, Sadaoki Furui
- Organizer
  INTERSPEECH 2008
- Place of Presentation
  Brisbane、Australia
- Year and Date
  2008-09-22
[Presentation] スペクトルサブトラクションとハブ変換による基本周波数情報を用いた耐雑音音声認識2008
- Author(s)
  安井英己、岩野公司、篠田浩一、古井貞煕
- Organizer
  日本音響学会
- Place of Presentation
  九州
- Year and Date
  2008-09-10
[Remarks] ホームページ等
- URL
  http://www.ks.cs.titech.ac.jp

2010 Fiscal Year Final Research Report

A study of multimodal recognition for human communication search

Principal Investigator

SHINODA Koichi Tokyo Institute of Technology, 大学院・情報理工学研究科, 准教授 (10343097)

Research Products

[Journal Article] Acoustic Model Adaptation for Speech Recognition2010

Author(s)

Journal Title

[Journal Article] 大規模映像資源のためのマルチモーダル高次特徴検出2010

Author(s)

Journal Title

[Journal Article] Semi-synchronous speech and pen input for mobile user interfaces2010

Author(s)

Journal Title

[Journal Article] Automatic recognition of Indonesian declarative questions and statements using polynomial coefficients of the pitch contours2009

Author(s)

Journal Title

[Presentation] 音響モデル学習のための相対エントロピーを用いた学習文選択手法2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Voting Approach in SMAP Adaptation for Speaker Verification2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 雑音下音声におけるスペクトル縮小の分析とその対雑音音声認識への利用2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] マルチモーダル・マルチフレームな手法を用いたTTECVIDセマンティックインデクシング2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 音響モデル学習のための相対エントロピーを用いた学習文選択2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Inter-speaker weighted MAP adaptation for GNM-supervector speaker recognition2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Optimal use of trees in structural MAP adaptation for speaker verification2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation]2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 会議音声認識のためのスペクトル減算に基づく音源分離2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] SIFT混合ガウス分布を用いた一般物体認識のためのマルチカーネル学習2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Robust Gait Recognition against Speed Variation2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] High-Level Feature Extraction Using SIFT GMMs and Audio Models2010

Author(s)

Organizer