1998 Fiscal Year Final Research Report Summary

Model and example based prosodic feature extraction and its efficient integration for speech recognition along with phoneme-based recognition

Research Project

Project/Area Number	08680391
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Japan Advanced Institute of Science and Technology, Hokuriku
Principal Investigator	SHIMODAIRA Hiroshi JAIST,School of Information Science and Associate Professor, 情報科学研究科, 助教授 (30206239)
Co-Investigator(Kenkyū-buntansha)	NAKAI Mitsuru JAIST,School of Information Science and Associate, 情報科学研究科, 助手 (60283149)
Project Period (FY)	1996 – 1998
Keywords	prosody / prosodic-boundary / pitch pattern / speech recognition
Research Abstract	The aim of this research is to exploit the prosodic information contained in speech for automatic speech recognition, where the prosodic information as well as phonemic information plays an important role for speech recognition. (a) Robust pitch determination algorithm : In contrast to the conventional pitch trackers based on numerical curve-fitting, the proposed method employs a quantitative pitch generation model, which is often used for synthesizing F_0 contour from prosodic event commands for estimating continuous F0 pattern. An inverse filtering technique is employed for obtaining the initial candidates of the prosodic commands. In order to find the optimal command sequence from the commands efficiently, a beam-search algorithm and an N-best technique are employed. Preliminary experiments for a male speaker of the ATR B-set database showed promising results both in quality of the restored pattern and estimation of the prosodic events. Along with the improvement of F_0 smoothing technique above, a novel approach of frame-wise pitch determination algorithm which gives reliability of pitch frequency, was proposed as well. (b) Prosodically guided speech recognition : i. As a first step toward speech recognition based on prosodic information, isolated word recognition task under noisy environment was employed. Experiments showed that word pitch pattern helps reducing the ambiguity in discriminating similar words. ii. It was shown that the dependencies between consecutive phrases can be measured by means of prosodic features, where 87 % accuracy rate was obtained for the ATR read speech data. iii. A prototype of prosodically guided speech recognition system was developed, where phrase hypotheses given by phoneme recognition are rescored on the basis of likelihood of phrase boundaries measured by prosodic features.

Research Products
(18 results)

All Other

All Publications (18 results)

[Publications] 漢野救泰: "低域スペクトルの予測残差を利用した非定常高騒音環境での有声音区間の検出" 電子情報通信学会論文誌 D-II. J80-DII,1. 26-35 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 中井満: "F_0 生成モデルを用いたテンプレートに基づく連続音声の句境界検出" 電子情報通信学会論文誌 D-II. J80-DII,10. 2605-2614 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Mitsuru Nakai: "Accent Phrase Segmentation by F_0 Clustering Using Superpositional Modelling" Computing Prosody, Springer. 343-359 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Mitsuru Nakai: "On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc.EuroSpeech'97. 243-246 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroshi Shimnodaira: "Restoration of Pitch Pattern of Speech Based on a Pitch Generation" Proc.EuroSpeech'97. 521-524 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroshi Shimnodaira: "Modified Minimum Classification Error Learning and Its Application to Neural Net-works" Advances in Pattern Recognition. 785-794 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Jun Rokui: "Improving the Generalization Performance of the Minimum Classification Error Learning and Its Application to Neural Networks" The Fifth International Conference on Neural Information Process-ing (ICONIP'98). 63-67 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Mitsuru Nakai: "The Use of F_0 Reliability Function for Prosodic Command Analysis on F_0 Contour Generation Model" proc.of the 5th International Conference on Spoken Language Pro-cessing (ICSLP98). 998 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroshi Shimodaira: "Improving the Generalization Performance of the MCE/GPD Learning" proc.of the 5th International Conference on Spoken Language Pro-cessing (ICSLP98). 795 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] S.Kanno and H.Shimodaira: "Voiced Sound Detection under Non-stationary and Heavy Noisy Environment Using the Prediction of Low-Frequency Spectrum" IEICE Trans.D-II,Vol.J80-DII,No.1. 26-35 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] M.Nakai, H.Singer, Y.Sakisaga and H.Shimodaira: "Accent Phrase Segmentation on F_0 Templates Using a Superpositional Prosodic Model" IEICE Trans.D-II,Vol.J80-DII,No.10. 2605-2614 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Mitsuru Nakai, Harald Singer, Yoshinori Sagisaka, Hiroshi Shimodaira: "Accent Phrase Segmentation by F_0 Clustering Using Superpositional Modelling" Computing Prosody (Y,Sagisaka, N.Compbell, N.Higuchi Ed.) Springer. 343-359 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Mitsuru Nakai and Hiroshi Simodaira: "On Representation of Funadamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc, EuroSpeech'97. 243-246 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroshi Shimodaira, Mitsuru Nakai and Akihiro Kumata: "Restoration of Pitch Pattern of Speech Based on a Pitch Generation Model" Proc, EuroSpeech'97. 521-524 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroshi Shimodaira, Jun Rokui and Mitsuru Nakai: "Modified Minimum Classification Error Learning and Its Application to Neural Networks" Advances in Pattern Recognition (Joint IAPR International work-shops SPPR'98 AND SPR'98). 785-794 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Jun Rokui and Hiroshi Shimodaira: "Improving the Generalization Performance of the Minimum Classification Error Learning and Its Application to Neural Networks" The Fifth International Conference on Neural Information Processing (ICONIP'98). 63-67 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Mitsuru Nakai and Hiroshi Shimodaia: "The Use of F_0 Reliability Function for Prosodic Command Analysis on F_0 Contour Generation Model" The 5th International Conference on Spoken Language Processing (ICSLP98). #998 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroshi Shimodaiya and Jun Rokui: "Improving the Generaliazation Performance of the MCE/GPD Learning" The 5th International Conference on Spoken Language Processing (ICSLP98). #795 (1998)
- Description
  「研究成果報告書概要(欧文)」より

1998 Fiscal Year Final Research Report Summary

Model and example based prosodic feature extraction and its efficient integration for speech recognition along with phoneme-based recognition

Principal Investigator

SHIMODAIRA Hiroshi JAIST,School of Information Science and Associate Professor, 情報科学研究科, 助教授 (30206239)

Research Products

[Publications] 漢野 救泰: "低域スペクトルの予測残差を利用した非定常高騒音環境での有声音区間の検出" 電子情報通信学会 論文誌 D-II. J80-DII,1. 26-35 (1997)

Description

[Publications] 中井 満: "F_0 生成モデルを用いたテンプレートに基づく連続音声の句境界検出" 電子情報通信学会 論文誌 D-II. J80-DII,10. 2605-2614 (1997)

Description

[Publications] Mitsuru Nakai: "Accent Phrase Segmentation by F_0 Clustering Using Superpositional Modelling" Computing Prosody, Springer. 343-359 (1997)

Description

[Publications] Mitsuru Nakai: "On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc.EuroSpeech'97. 243-246 (1997)

Description

[Publications] Hiroshi Shimnodaira: "Restoration of Pitch Pattern of Speech Based on a Pitch Generation" Proc.EuroSpeech'97. 521-524 (1997)

Description

[Publications] Hiroshi Shimnodaira: "Modified Minimum Classification Error Learning and Its Application to Neural Net-works" Advances in Pattern Recognition. 785-794 (1998)

Description

[Publications] Jun Rokui: "Improving the Generalization Performance of the Minimum Classification Error Learning and Its Application to Neural Networks" The Fifth International Conference on Neural Information Process-ing (ICONIP'98). 63-67 (1998)

Description

[Publications] Mitsuru Nakai: "The Use of F_0 Reliability Function for Prosodic Command Analysis on F_0 Contour Generation Model" proc.of the 5th International Conference on Spoken Language Pro-cessing (ICSLP98). 998 (1998)

Description

[Publications] Hiroshi Shimodaira: "Improving the Generalization Performance of the MCE/GPD Learning" proc.of the 5th International Conference on Spoken Language Pro-cessing (ICSLP98). 795 (1998)

Description

[Publications] S.Kanno and H.Shimodaira: "Voiced Sound Detection under Non-stationary and Heavy Noisy Environment Using the Prediction of Low-Frequency Spectrum" IEICE Trans.D-II,Vol.J80-DII,No.1. 26-35 (1997)

Description

[Publications] M.Nakai, H.Singer, Y.Sakisaga and H.Shimodaira: "Accent Phrase Segmentation on F_0 Templates Using a Superpositional Prosodic Model" IEICE Trans.D-II,Vol.J80-DII,No.10. 2605-2614 (1997)

Description

[Publications] Mitsuru Nakai, Harald Singer, Yoshinori Sagisaka, Hiroshi Shimodaira: "Accent Phrase Segmentation by F_0 Clustering Using Superpositional Modelling" Computing Prosody (Y,Sagisaka, N.Compbell, N.Higuchi Ed.) Springer. 343-359 (1997)

Description

[Publications] Mitsuru Nakai and Hiroshi Simodaira: "On Representation of Funadamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc, EuroSpeech'97. 243-246 (1997)

Description

[Publications] Hiroshi Shimodaira, Mitsuru Nakai and Akihiro Kumata: "Restoration of Pitch Pattern of Speech Based on a Pitch Generation Model" Proc, EuroSpeech'97. 521-524 (1997)

Description

[Publications] Hiroshi Shimodaira, Jun Rokui and Mitsuru Nakai: "Modified Minimum Classification Error Learning and Its Application to Neural Networks" Advances in Pattern Recognition (Joint IAPR International work-shops SPPR'98 AND SPR'98). 785-794 (1998)

Description

[Publications] Jun Rokui and Hiroshi Shimodaira: "Improving the Generalization Performance of the Minimum Classification Error Learning and Its Application to Neural Networks" The Fifth International Conference on Neural Information Processing (ICONIP'98). 63-67 (1998)

Description

[Publications] Mitsuru Nakai and Hiroshi Shimodaia: "The Use of F_0 Reliability Function for Prosodic Command Analysis on F_0 Contour Generation Model" The 5th International Conference on Spoken Language Processing (ICSLP98). #998 (1998)

Description

[Publications] Hiroshi Shimodaiya and Jun Rokui: "Improving the Generaliazation Performance of the MCE/GPD Learning" The 5th International Conference on Spoken Language Processing (ICSLP98). #795 (1998)

Description

[Publications] 漢野救泰: "低域スペクトルの予測残差を利用した非定常高騒音環境での有声音区間の検出" 電子情報通信学会論文誌 D-II. J80-DII,1. 26-35 (1997)

[Publications] 中井満: "F_0 生成モデルを用いたテンプレートに基づく連続音声の句境界検出" 電子情報通信学会論文誌 D-II. J80-DII,10. 2605-2614 (1997)