Advanced indexing based on spoken document retrieval and its feedback

Research Project

Project/Area Number	25330128
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Multimedia database
Research Institution	Shizuoka University
Principal Investigator	KAI ATSUHIKO 静岡大学, 工学部, 准教授 (60283496)
Co-Investigator(Kenkyū-buntansha)	WANG Longbiao 長岡技術科学大学, 技学研究院, 准教授 (30510458)
Co-Investigator(Renkei-kenkyūsha)	KOGURE Satoru 静岡大学, 情報学部, 講師 (40359758)
Project Period (FY)	2013-04-01 – 2016-03-31
Project Status	Completed (Fiscal Year 2015)
Budget Amount *help	¥4,940,000 (Direct Cost: ¥3,800,000、Indirect Cost: ¥1,140,000) Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2014: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2013: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords	音声ドキュメント検索 / 音声検索語検出 / STD / 音声クエリ / DNN / 音声認識信頼度 / スコア正規化 / 音声区間検出 / 雑音残響環境 / 残響除去 / 認識精度推定 / VAD / 話者認識 / 信頼度
Outline of Final Research Achievements	We investigated and developed elemental technologies for indexing and other related processes which are designed to permit efficient and sustainable development of spoken document retrieval systems. For dealing with a possible change in speech features regarding to the recording conditions and speakers, we proposed DNN-based voice activity detection (VAD) and dereverberation models as a frontend of speaker diarization and speech recognition systems and improved accuracy for those systems. Also, we proposed DNN-based feature transformation as a rescoring step of spoken term detection (STD) system for coping with out-of-vocabulary words and the STD performance has been significantly improved.

Report

(4 results)

2015 Annual Research Report Final Research Report ( PDF )
2014 Research-status Report
2013 Research-status Report

Research Products
(20 results)

All 2016 2015 2014 2013 Other

All Journal Article (10 results) (of which Peer Reviewed: 9 results, Open Access: 3 results, Acknowledgement Compliant: 1 results) Presentation (10 results) (of which Int'l Joint Research: 1 results)

[Journal Article] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition2015
- Author(s)
  Ren, Bo and Wang, Longbiao and Lu, Liang and Ueda, Yuma and Kai, Atsuhiko
- Journal Title
  
  MULTIMEDIA TOOLS AND APPLICATIONS
  
  Volume: 75 Pages: 1-16
- Related Report
  2015 Annual Research Report
- Peer Reviewed
[Journal Article] Environment-dependent denoising autoencoder for distant-talking speech recognition2015
- Author(s)
  Y. Ueda, L. Wang, A. Kai, B. Ren
- Journal Title
  
  Eurasip Journal on Advances in Signal Processing
  
  Volume: 2015:92 Issue: 1 Pages: 1-11
- DOI
  10.1186/s13634-015-0278-y
- Related Report
  2015 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation2014
- Author(s)
  Zhaofeng Zhang, Longbiao Wang and Atsuhiko Kai
- Journal Title
  
  EURASIP Journal on Audio, Speech, and Music Processing
  
  Volume: 2014:15 Issue: 1 Pages: 1-12
- DOI
  10.1186/1687-4722-2014-15
- Related Report
  2014 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Combining Subword and State-level Dissimilarity Measures for Improved Spoken Term Detection in NTCIR-11 SpokenQuery&Doc Task2014
- Author(s)
  Mitsuaki Makino and Atsuhiko Kai
- Journal Title
  
  Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies
  
  Volume: - Pages: 413-418
- Related Report
  2014 Research-status Report
- Open Access
[Journal Article] Utilizing State-level Distance Vector Representation for Improved Spoken Term Detection by Text and Spoken Queries2014
- Author(s)
  Mitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai
- Journal Title
  
  Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
  
  Volume: - Pages: 1732-1736
- Related Report
  2014 Research-status Report
- Peer Reviewed
[Journal Article] Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording2014
- Author(s)
  Longbiao Wang, Bo Ren, Yuma Ueda, Atsuhiko Kai, Shunta Teraoka and Taku Fukushima
- Journal Title
  
  Proceedings of Asia-Pacific Signal Information Processing Association Annual Summit and Conference (APSIPA ASC)
  
  Volume: - Pages: 1-5
- DOI
  10.1109/apsipa.2014.7041548
- Related Report
  2014 Research-status Report
- Peer Reviewed
[Journal Article] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization2014
- Author(s)
  Yuma Ueda, Longbiao Wang, Atsuhiko Kai, Xiong Xiao, EngSiong Chng and Haizhou Li
- Journal Title
  
  Proceedings of the 9th International Symposium on Chinese Spoken Language Processing (ISCSLP 2014)
  
  Volume: - Pages: 379-383
- DOI
  10.1109/iscslp.2014.6936613
- Related Report
  2014 Research-status Report
- Peer Reviewed
[Journal Article] Single-sided Approach to Discriminative PLDA Training for Text-Independent Speaker Verification without Using Expanded I-vector2014
- Author(s)
  Ikuya Hirano, Kong Aik Lee, Zhaofeng Zhang, Longbiao Wang and Atsuhiko Kai
- Journal Title
  
  Proceedings of the 9th International Symposium on Chinese Spoken Language Processing (ISCSLP 2014)
  
  Volume: - Pages: 59-63
- DOI
  10.1109/iscslp.2014.6936581
- Related Report
  2014 Research-status Report
- Peer Reviewed
[Journal Article] Using Acoustic Dissimilarity Measures Based on State-Level Distance Vector Representation for Improved Spoken Term Detection2013
- Author(s)
  Naoki Yamamoto, Atsuhiko Kai
- Journal Title
  
  Proc. of APSIPA Annual Summit and Conference 2013
  
  Volume: - Pages: 1-4
- DOI
  10.1109/apsipa.2013.6694151
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Journal Article] Improvement of distant-talking speaker identification using bottleneck features of DNN2013
- Author(s)
  Takanori Yamada, Longbiao Wang, Atsuhiko Kai
- Journal Title
  
  Proc. of INTERSPEECH 2013
  
  Volume: - Pages: 3661-3664
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Presentation] Combining State-level and DNN-based Acoustic Matches for Efficient Spoken Term Detection in NTCIR-12 SpokenQuery&Doc-2 Task2016
- Author(s)
  Shuji Oishi, Tatsuya Matsuba, Mitsuaki Makino, Atsuhiko Kai
- Organizer
  NTCIR 12 Conference
- Place of Presentation
  学術総合センター（東京）
- Year and Date
  2016-06-08
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] Cepstral domain denoising autoencoder およびDNN-HMM による雑音･残響下音声認識2015
- Author(s)
  上田雄磨，王龍標，甲斐充彦
- Organizer
  日本音響学会2015年春季研究発表会
- Place of Presentation
  中央大学後楽園キャンパス（東京都文京区）
- Year and Date
  2015-03-17
- Related Report
  2014 Research-status Report
[Presentation] Speech selection and environmental adaptation for asynchronous speech recording based on deep neural network2014
- Author(s)
  Bo Ren, Longbiao Wang and Atsuhiko Kai
- Organizer
  第16回音声言語シンポジウム（電子情報通信学会）
- Place of Presentation
  東京工業大学すずかけ台キャンパス（神奈川県横浜市）
- Year and Date
  2014-12-16
- Related Report
  2014 Research-status Report
[Presentation] DNNに基づく特徴変換による残響環境話者認識2014
- Author(s)
  張　兆峰, 王　龍標, 甲斐充彦, 李　衛鋒, 岩橋政宏
- Organizer
  第16回音声言語シンポジウム（電子情報通信学会）
- Place of Presentation
  東京工業大学すずかけ台キャンパス（神奈川県横浜市）
- Year and Date
  2014-12-16
- Related Report
  2014 Research-status Report
[Presentation] 会議音声における音声区間検出のためのDeep Neural Networkとクロス適応の検討2014
- Author(s)
  中谷彰宏, 王　龍標, 甲斐充彦
- Organizer
  第16回音声言語シンポジウム（電子情報通信学会）
- Place of Presentation
  東京工業大学すずかけ台キャンパス（神奈川県横浜市）
- Year and Date
  2014-12-15
- Related Report
  2014 Research-status Report
[Presentation] 非同期音声収録を用いた遠隔発話音声認識2014
- Author(s)
  寺岡俊汰, 上田雄磨, 王　龍標, 甲斐充彦, 福島　拓
- Organizer
  音学シンポジウム2014 （電子情報通信学会）
- Place of Presentation
  日本大学文理学部キャンパス（東京都世田谷区）
- Year and Date
  2014-05-24
- Related Report
  2014 Research-status Report
[Presentation] Spoken Term Detection Using Distance-Vector based Dissimilarity Measures and Its Evaluation on the NTCIR-10 SpokenDoc-2 Task
- Author(s)
  Naoki Yamamoto, Atsuhiko Kai
- Organizer
  The 10th NTCIR Conference
- Place of Presentation
  学術総合センター（東京）
- Related Report
  2013 Research-status Report
[Presentation] 雑音に頑健な音声区間検出のためのDeep Belief Networkの適用
- Author(s)
  中谷彰宏, 王龍標, 甲斐充彦
- Organizer
  日本音響学会2013年秋季研究発表会
- Place of Presentation
  豊橋技術科学大学（愛知）
- Related Report
  2013 Research-status Report
[Presentation] 分布間距離ベクトルに基づく音響的類似度とサブワード事後確率の併用による音声検索語検出の改善
- Author(s)
  山本直樹, 甲斐充彦
- Organizer
  情報処理学会音声言語情報処理研究会
- Place of Presentation
  筑波大学文京キャンパス（東京）
- Related Report
  2013 Research-status Report
[Presentation] 分布間距離ベクトル表現による音響的類似度を利用したテキスト及び音声クエリからの音声検索語検出の改善
- Author(s)
  牧野光晃, 山本直樹, 甲斐充彦
- Organizer
  第8回音声ドキュメント処理ワークショップ
- Place of Presentation
  豊橋市民センター（愛知）
- Related Report
  2013 Research-status Report

Advanced indexing based on spoken document retrieval and its feedback

Principal Investigator

KAI ATSUHIKO 静岡大学, 工学部, 准教授 (60283496)

¥4,940,000 (Direct Cost: ¥3,800,000、Indirect Cost: ¥1,140,000)

Report

Research Products

[Journal Article] Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition2015

Author(s)

Journal Title

Related Report

[Journal Article] Environment-dependent denoising autoencoder for distant-talking speech recognition2015

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Combining Subword and State-level Dissimilarity Measures for Improved Spoken Term Detection in NTCIR-11 SpokenQuery&Doc Task2014

Author(s)

Journal Title

Related Report

[Journal Article] Utilizing State-level Distance Vector Representation for Improved Spoken Term Detection by Text and Spoken Queries2014

Author(s)

Journal Title

Related Report

[Journal Article] Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Single-channel dereverberation for distant-talking speech recognition by combining denoising autoencoder and temporal structure normalization2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Single-sided Approach to Discriminative PLDA Training for Text-Independent Speaker Verification without Using Expanded I-vector2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Using Acoustic Dissimilarity Measures Based on State-Level Distance Vector Representation for Improved Spoken Term Detection2013

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Improvement of distant-talking speaker identification using bottleneck features of DNN2013

Author(s)

Journal Title

Related Report

[Presentation] Combining State-level and DNN-based Acoustic Matches for Efficient Spoken Term Detection in NTCIR-12 SpokenQuery&Doc-2 Task2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Cepstral domain denoising autoencoder およびDNN-HMM による雑音･残響下音声認識2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Speech selection and environmental adaptation for asynchronous speech recording based on deep neural network2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] DNNに基づく特徴変換による残響環境話者認識2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 会議音声における音声区間検出のためのDeep Neural Networkとクロス適応の検討2014

Author(s)

Organizer

Place of Presentation