Spoken Language Proceeding Based on Non-Extensive Information Theory

Research Project

Project/Area Number	24650079
Research Category	Grant-in-Aid for Challenging Exploratory Research
Allocation Type	Multi-year Fund
Research Field	Perception information processing/Intelligent robotics
Research Institution	Tokyo Institute of Technology
Principal Investigator	SHINODA KOICHI 東京工業大学, 情報理工学(系)研究科, 教授 (10343097)
Project Period (FY)	2012-04-01 – 2015-03-31
Project Status	Completed (Fiscal Year 2014)
Budget Amount *help	¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000) Fiscal Year 2014: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2013: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2012: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	音声情報処理 / 映像情報処理 / 画像情報処理
Outline of Final Research Achievements	We have developed a methodology for spoken language processing based on non-extensive statistical theory, which is an extension from the conventional extensive statistical theory. We first developed q-log spectral subtraction (q-LMSN) to achieve robustness against the difference of environmental noises and of channels. We proved that it was significantly better than the conventional CMN. Next, we developed a recognition a method using q-Gaussian mixtures for output probabilities in GMMs and in HMMs. We applied it to speech recognition and to video semantic indexing and proved its effectiveness.

Report

(4 results)

2014 Annual Research Report Final Research Report ( PDF )
2013 Research-status Report
2012 Research-status Report

Research Products
(6 results)

All 2014 2013 2012

All Journal Article (2 results) (of which Peer Reviewed: 1 results) Presentation (4 results) (of which Invited: 1 results)

[Journal Article] q-Gaussian Mixture Models for Image and Video Semantic Indexing2013
- Author(s)
  Nakamasa Inoue, Koichi Shinoda
- Journal Title
  
  Journal of Visual Communication and Image Representation
  
  Volume: 24 Issue: 8 Pages: 1450-1457
- DOI
  10.1016/j.jvcir.2013.10.005
- NAID
  120006582288
- Related Report
  2013 Research-status Report
[Journal Article] Feature normalization based on non-extensive statistics for speech recognition2013
- Author(s)
  Hilman F. Pardede, Koji Iwano, Koichi Shinodaa
- Journal Title
  
  Speech Commuication
  
  Volume: 55 Pages: 587-599
- NAID
  120006582242
- Related Report
  2012 Research-status Report
- Peer Reviewed
[Presentation] TokyoTech-Waseda at TRECVID 20142014
- Author(s)
  Nakamasa Inoue, Zhuolin Liang, Mengxi Lin, Tran Hai Dang, Koichi Shinoda, Zhang Xuefeng, Kazuya Ueki
- Organizer
  Proc. TRECVID workshop
- Place of Presentation
  セントラルフロリダ大学(米国)
- Year and Date
  2014-11-10 – 2014-11-12
- Related Report
  2014 Annual Research Report
[Presentation] Robust Video Information Retrieval using Speech Technologies2014
- Author(s)
  Koichi Shinoda
- Organizer
  APSIPA distinguished lecture
- Place of Presentation
  カーネギメロン大学(米国)
- Year and Date
  2014-06-20
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] 音声認識のためのq ガウス分布を用いた音響モデル2013
- Author(s)
  周澤西, 岩野公司, 篠田浩一
- Organizer
  日本音響学会2013年春季研究発表会
- Place of Presentation
  東京工科大学, 八王子, 東京
- Related Report
  2012 Research-status Report
[Presentation] Q-Gaussian based spectral subtraction for robust speech recognition2012
- Author(s)
  Hilman F. Pardede, Koichi Shinoda and Koji Iwano
- Organizer
  INTERSPEECH2013
- Place of Presentation
  Portland, OR, U.S.A
- Related Report
  2012 Research-status Report

Spoken Language Proceeding Based on Non-Extensive Information Theory

Principal Investigator

SHINODA KOICHI 東京工業大学, 情報理工学(系)研究科, 教授 (10343097)

¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)

Report

Research Products

[Journal Article] q-Gaussian Mixture Models for Image and Video Semantic Indexing2013

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] Feature normalization based on non-extensive statistics for speech recognition2013

Author(s)

Journal Title

NAID

Related Report

[Presentation] TokyoTech-Waseda at TRECVID 20142014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Robust Video Information Retrieval using Speech Technologies2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声認識のためのq ガウス分布を用いた音響モデル2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Q-Gaussian based spectral subtraction for robust speech recognition2012

Author(s)

Organizer

Place of Presentation

Related Report