2005 Fiscal Year Final Research Report Summary

Universal-Phonetic-Segment-Based Speech Coding and Its Applications to Speech Processing

Research Project

Project/Area Number	15300026
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Media informatics/Database
Research Institution	University of Tsukuba
Principal Investigator	TANAKA Kazuyo University of Tsukuba, Graduate School of Library, Information and Media Studies, Professor, 大学院・図書館情報メディア研究科, 教授 (70344207)
Co-Investigator(Kenkyū-buntansha)	ITOH Yoshiaki Iwate Prefectural University, Faculty of Software and Information Science, Associate Professor, ソフトウエア情報学部, 助教授 (90325928) OKAWA Shigeki Chiba Institute of Technology, Dept.of Information and Network Science, Associate Professor, 情報科学部, 助教授 (40306395) KOJIMA Hiroaki National Institute of Advanced Industrial Science and Technology, Research Group Leader, 情報技術研究部門, グループリーダ (80356980)
Project Period (FY)	2003 – 2005
Keywords	speech recognition / spoken document retrieval / phonetic code / IPA / Dynamic Programming / phone model / multilingual / open vocabulary
Research Abstract	In this project, we present a novel speech processing framework, where all of the acoustic speech samples are once encoded into universal phonetic segment (UPS) sequences and spoken document processing (SDP) systems, such as recognition, retrieval, indexing, are constructed on this UPS domain. Adopting this framework, the SDP systems are separated from the original acoustic correlates or environments. This makes it possible to realize such flexibility that recognition-type processing can be handled by just calculating distances between UPS sequences, and also can be constructed on distributed processing schemes. Through this project, we have developed the following component techniques on this framework : 1)an original fine sub-phonetic segment (SPS) set as the UPS set, which brought high performance recognition and easy processing of multilingual speech, 2)effective DP(dynamic programming)-based sequence matching algorithms, called Shift CDP and Relay CDP. Effectiveness of the processing framework, the SPS set, and DP-based algorithms are evaluated by constructing speech recognition and open vocabulary spoken document retrieval (SDR) systems. Experimental results showed that the proposed SDP systems are superior to those based on conventional methods in performance evaluation. We have finally constructed a real time open vocabulary SDR system for demonstration, in which the system can retrieve broadcast video by user's speech.

Research Products
(25 results)

All 2006 2005 2004 2003 Other

All Journal Article (22 results) Book (2 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] HMM-based noise-robust feature compensation2006
- Author(s)
  Akira Sasou
- Journal Title
  
  International Journal of Speech Communication Accepted, In publication
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Combining Multiple subword representations for open-vocabulary spoken document retrieval2005
- Author(s)
  Lee, S.W.
- Journal Title
  
  Proc. of International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP2005) 1
  
  Pages: 505-508
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] An algorithm for similar utterance section extraction for managing spoken documents2005
- Author(s)
  Itoh, Y.
- Journal Title
  
  Multimedia Systems,ISSN : 0942-4962 10・5
  
  Pages: 432-443
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] An Approach for Retrieving Inquiries in TV Broadcasts in a Disaster2005
- Author(s)
  K.Iwata
- Journal Title
  
  Proc. of IASTED International Conference on Signal and Image Processing, 1
  
  Pages: 34-39
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals2005
- Author(s)
  T.Taniguchi
- Journal Title
  
  Proceedings of Interspeech2005 1
  
  Pages: 589-592
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Combining Multiple subword representations for open-vocabulary spoken document retrieval,2005
- Author(s)
  Lee, S.W., Tanaka, K., Itoh, Y.
- Journal Title
  
  Proc.,of International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP2005) Vol.1
  
  Pages: 505-508
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] An algorithm for similar utterance section extraction for managing spoken documents,2005
- Author(s)
  Itoh, Y., Tanaka, K., Lee, S.W.
- Journal Title
  
  Multimedia Systems ISSN:0942-4962 Vol.10, No.5
  
  Pages: 432-443
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] An Approach for Retrieving Inquiries in TV Broadcasts in Disaster,2005
- Author(s)
  Kohei Iwata, Yoshiaki Itoh, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee
- Journal Title
  
  Proc.of IASTED International Conference on Signal and Image Processing
  
  Pages: 34-39
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals,2005
- Author(s)
  Toru Taniguchi, Akishige Adachi, Shigeki Okawa, Masaaki Honda, Katsuhiko Shirai
- Journal Title
  
  Proc.of Interspeech2005
  
  Pages: 589-592
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Open-vocabulary spoken document retrieval based on multiligual subphonetic segment recognition2004
- Author(s)
  Lee, S.W.
- Journal Title
  
  Proc. of 18th International Congress on Acoustics(ICA2004) 2
  
  Pages: 1723-1726
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm2004
- Author(s)
  Itoh, Y.
- Journal Title
  
  Journal of Acoustical Society of America(JASA) 116-2
  
  Pages: 1234-1243
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Robust spoken document retrieval based on multiligual subphonetic segment recognition2004
- Author(s)
  Lee, S.W.
- Journal Title
  
  Proc. of 6th International Conference on Enterprise Information Systems CD-ROM
  
  Pages: 1-7
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Open-vocabulary spoken document retrieval based on multiligual subphonetic segment recognition,2004
- Author(s)
  Lee, S.W., Tanaka, K., Itoh, Y.
- Journal Title
  
  Proc.of 18th International Congress on Acoustics (ICA2004) Vol.II
  
  Pages: 1723-1726
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm,2004
- Author(s)
  Itoh, Y, Tanaka, K.
- Journal Title
  
  Journal of Acoustical Society of America (JASA) Vol.116, No.2
  
  Pages: 1234-1243
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Robust spoken document retrieval based on multiligual subphonetic segment recognition,2004
- Author(s)
  Lee, S.W., Tanaka, K., Itoh, Y.
- Journal Title
  
  Proc.,of 6th International Conference on Enterprise Information Systems (CD-ROM)
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] 時系列パターンの任意部分区間の高速マッチング手法Shift CDP法2003
- Author(s)
  伊藤慶明
- Journal Title
  
  電子情報通信学会論文誌D-II J85-D-II No.9
  
  Pages: 1267-1277
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Mixed-Lingual Spoken Word Recognition by Using VQ Codebook Sequnces of Variable Length Segments2003
- Author(s)
  Kojima, H.
- Journal Title
  
  Proc. of the European Conference on Speech Communication and Technology 4
  
  Pages: 2485-2488
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Statistical estimation of phoneme's most stable point based on universal constraints2003
- Author(s)
  Shigeki Okawa
- Journal Title
  
  Proc. of 7th European Conference on Speech Communication 2
  
  Pages: 781-784
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] A fast matching algorithm called shift continuous DP between arbitrary parts of two time sequence data sets,2003
- Author(s)
  Yoshiaki Itoh
- Journal Title
  
  IEICE Trans.Information and Systems (Japanese Ed.) Vol.J89-D, No.3
  
  Pages: 1267-1277
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Mixed-Lingual Spoken Word Recognition by Using VQ Codebook Sequnces of Variable Length Segments,2003
- Author(s)
  Hiroaki Kojima, Kazuyo Tanaka
- Journal Title
  
  Proc.of the European Conference on Speech Communication and Technology
  
  Pages: 2485-2488
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Statistical estimation of phoneme's most stable point based on universal constraints,2003
- Author(s)
  Shigeki Okawa, Katsuhiko Shirai
- Journal Title
  
  Proceedings of 7th European Conference on Speech Communication and Technology
  
  Pages: 781-784
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] HMM-based noise-robust feature compensation,
- Author(s)
  Akira Sasou, Futoshi Asano, Satoshi Nakamura, Kazuyo Tanaka
- Journal Title
  
  International Journal of Speech Communication (Accepted, in publication)
- Description
  「研究成果報告書概要(欧文)」より
[Book] 音声工学2005
- Author(s)
  板橋秀一
- Total Pages
  244
- Publisher
  森北出版
- Description
  「研究成果報告書概要(和文)」より
[Book] Speech Technology, ISBN4-627-828112005
- Author(s)
  S.Itahashi, K.Tanaka, et al.
- Total Pages
  244
- Publisher
  Morikita-Shuppan
- Description
  「研究成果報告書概要(欧文)」より
[Patent(Industrial Property Rights)] 視覚的かつ聴覚的類似品名提示装置2004
- Inventor(s)
  田中和世
- Industrial Property Rights Holder
  国立大学法人筑波大学
- Industrial Property Number
  出願番号 : 特願2004-271381
- Filing Date
  2004-09-17
- Description
  「研究成果報告書概要(和文)」より

2005 Fiscal Year Final Research Report Summary

Universal-Phonetic-Segment-Based Speech Coding and Its Applications to Speech Processing

Principal Investigator

TANAKA Kazuyo University of Tsukuba, Graduate School of Library, Information and Media Studies, Professor, 大学院・図書館情報メディア研究科, 教授 (70344207)

Research Products

[Journal Article] HMM-based noise-robust feature compensation2006

Author(s)

Journal Title

Description

[Journal Article] Combining Multiple subword representations for open-vocabulary spoken document retrieval2005

Author(s)

Journal Title

Description

[Journal Article] An algorithm for similar utterance section extraction for managing spoken documents2005

Author(s)

Journal Title

Description

[Journal Article] An Approach for Retrieving Inquiries in TV Broadcasts in a Disaster2005

Author(s)

Journal Title

Description

[Journal Article] Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals2005

Author(s)

Journal Title

Description

[Journal Article] Combining Multiple subword representations for open-vocabulary spoken document retrieval,2005

Author(s)

Journal Title

Description

[Journal Article] An algorithm for similar utterance section extraction for managing spoken documents,2005

Author(s)

Journal Title

Description

[Journal Article] An Approach for Retrieving Inquiries in TV Broadcasts in Disaster,2005

Author(s)

Journal Title

Description

[Journal Article] Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals,2005

Author(s)

Journal Title

Description

[Journal Article] Open-vocabulary spoken document retrieval based on multiligual subphonetic segment recognition2004

Author(s)

Journal Title

Description

[Journal Article] Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm2004

Author(s)

Journal Title

Description

[Journal Article] Robust spoken document retrieval based on multiligual subphonetic segment recognition2004

Author(s)

Journal Title

Description

[Journal Article] Open-vocabulary spoken document retrieval based on multiligual subphonetic segment recognition,2004

Author(s)

Journal Title

Description

[Journal Article] Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm,2004

Author(s)

Journal Title

Description

[Journal Article] Robust spoken document retrieval based on multiligual subphonetic segment recognition,2004

Author(s)

Journal Title

Description

[Journal Article] 時系列パターンの任意部分区間の高速マッチング手法Shift CDP法2003

Author(s)

Journal Title

Description

[Journal Article] Mixed-Lingual Spoken Word Recognition by Using VQ Codebook Sequnces of Variable Length Segments2003

Author(s)

Journal Title

Description

[Journal Article] Statistical estimation of phoneme's most stable point based on universal constraints2003

Author(s)

Journal Title

Description

[Journal Article] A fast matching algorithm called shift continuous DP between arbitrary parts of two time sequence data sets,2003

Author(s)

Journal Title