• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Model and example based prosodic feature extraction and its efficient integration for speech recognition along with phoneme-based recognition

Research Project

Project/Area Number 08680391
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionJapan Advanced Institute of Science and Technology, Hokuriku

Principal Investigator

SHIMODAIRA Hiroshi  JAIST,School of Information Science and Associate Professor, 情報科学研究科, 助教授 (30206239)

Co-Investigator(Kenkyū-buntansha) NAKAI Mitsuru  JAIST,School of Information Science and Associate, 情報科学研究科, 助手 (60283149)
Project Period (FY) 1996 – 1998
Project Status Completed (Fiscal Year 1998)
Budget Amount *help
¥2,600,000 (Direct Cost: ¥2,600,000)
Fiscal Year 1998: ¥400,000 (Direct Cost: ¥400,000)
Fiscal Year 1997: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 1996: ¥1,300,000 (Direct Cost: ¥1,300,000)
Keywordsprosody / prosodic-boundary / pitch pattern / speech recognition / 藤崎モデル
Research Abstract

The aim of this research is to exploit the prosodic information contained in speech for automatic speech recognition, where the prosodic information as well as phonemic information plays an important role for speech recognition.
(a) Robust pitch determination algorithm : In contrast to the conventional pitch trackers based on numerical curve-fitting, the proposed method employs a quantitative pitch generation model, which is often used for synthesizing F_0 contour from prosodic event commands for estimating continuous F0 pattern. An inverse filtering technique is employed for obtaining the initial candidates of the prosodic commands. In order to find the optimal command sequence from the commands efficiently, a beam-search algorithm and an N-best technique are employed. Preliminary experiments for a male speaker of the ATR B-set database showed promising results both in quality of the restored pattern and estimation of the prosodic events.
Along with the improvement of F_0 smoothing technique above, a novel approach of frame-wise pitch determination algorithm which gives reliability of pitch frequency, was proposed as well.
(b) Prosodically guided speech recognition :
i. As a first step toward speech recognition based on prosodic information, isolated word recognition task under noisy environment was employed. Experiments showed that word pitch pattern helps reducing the ambiguity in discriminating similar words.
ii. It was shown that the dependencies between consecutive phrases can be measured by means of prosodic features, where 87 % accuracy rate was obtained for the ATR read speech data.
iii. A prototype of prosodically guided speech recognition system was developed, where phrase hypotheses given by phoneme recognition are rescored on the basis of likelihood of phrase boundaries measured by prosodic features.

Report

(4 results)
  • 1998 Annual Research Report   Final Research Report Summary
  • 1997 Annual Research Report
  • 1996 Annual Research Report
  • Research Products

    (30 results)

All Other

All Publications (30 results)

  • [Publications] 漢野 救泰: "低域スペクトルの予測残差を利用した非定常高騒音環境での有声音区間の検出" 電子情報通信学会 論文誌 D-II. J80-DII,1. 26-35 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] 中井 満: "F_0 生成モデルを用いたテンプレートに基づく連続音声の句境界検出" 電子情報通信学会 論文誌 D-II. J80-DII,10. 2605-2614 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Mitsuru Nakai: "Accent Phrase Segmentation by F_0 Clustering Using Superpositional Modelling" Computing Prosody, Springer. 343-359 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Mitsuru Nakai: "On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc.EuroSpeech'97. 243-246 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Hiroshi Shimnodaira: "Restoration of Pitch Pattern of Speech Based on a Pitch Generation" Proc.EuroSpeech'97. 521-524 (1997)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Hiroshi Shimnodaira: "Modified Minimum Classification Error Learning and Its Application to Neural Net-works" Advances in Pattern Recognition. 785-794 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Jun Rokui: "Improving the Generalization Performance of the Minimum Classification Error Learning and Its Application to Neural Networks" The Fifth International Conference on Neural Information Process-ing (ICONIP'98). 63-67 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Mitsuru Nakai: "The Use of F_0 Reliability Function for Prosodic Command Analysis on F_0 Contour Generation Model" proc.of the 5th International Conference on Spoken Language Pro-cessing (ICSLP98). 998 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Hiroshi Shimodaira: "Improving the Generalization Performance of the MCE/GPD Learning" proc.of the 5th International Conference on Spoken Language Pro-cessing (ICSLP98). 795 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] S.Kanno and H.Shimodaira: "Voiced Sound Detection under Non-stationary and Heavy Noisy Environment Using the Prediction of Low-Frequency Spectrum" IEICE Trans.D-II,Vol.J80-DII,No.1. 26-35 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] M.Nakai, H.Singer, Y.Sakisaga and H.Shimodaira: "Accent Phrase Segmentation on F_0 Templates Using a Superpositional Prosodic Model" IEICE Trans.D-II,Vol.J80-DII,No.10. 2605-2614 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Mitsuru Nakai, Harald Singer, Yoshinori Sagisaka, Hiroshi Shimodaira: "Accent Phrase Segmentation by F_0 Clustering Using Superpositional Modelling" Computing Prosody (Y,Sagisaka, N.Compbell, N.Higuchi Ed.) Springer. 343-359 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Mitsuru Nakai and Hiroshi Simodaira: "On Representation of Funadamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc, EuroSpeech'97. 243-246 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Hiroshi Shimodaira, Mitsuru Nakai and Akihiro Kumata: "Restoration of Pitch Pattern of Speech Based on a Pitch Generation Model" Proc, EuroSpeech'97. 521-524 (1997)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Hiroshi Shimodaira, Jun Rokui and Mitsuru Nakai: "Modified Minimum Classification Error Learning and Its Application to Neural Networks" Advances in Pattern Recognition (Joint IAPR International work-shops SPPR'98 AND SPR'98). 785-794 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Jun Rokui and Hiroshi Shimodaira: "Improving the Generalization Performance of the Minimum Classification Error Learning and Its Application to Neural Networks" The Fifth International Conference on Neural Information Processing (ICONIP'98). 63-67 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Mitsuru Nakai and Hiroshi Shimodaia: "The Use of F_0 Reliability Function for Prosodic Command Analysis on F_0 Contour Generation Model" The 5th International Conference on Spoken Language Processing (ICSLP98). #998 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Hiroshi Shimodaiya and Jun Rokui: "Improving the Generaliazation Performance of the MCE/GPD Learning" The 5th International Conference on Spoken Language Processing (ICSLP98). #795 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1998 Final Research Report Summary
  • [Publications] Mitsuru Nakai: "The Use of F_0 Reliability Function for Prosodic Command Analysis on F_0 Contour Generation Model" Proc.of the 5th International Conference on Spoken Language Pro-cessing(ICSLP98). 998 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] Hiroshi Shimodaira: "Improving the Generalization Performance of the MCE/GPD Learning" Proc.of the 5th International Conference on Spoken Language Pro-cessing(ICSLP98). 795 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] Mitsuru Nakai: "On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function" Proc.Euro Speech ‘97. 243-246 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] Hiroshi Shimodaira: "Restoration of Pitch Pattern of Speech Based on a Pitch Generation Model" Proc.Euro Speech ‘97. 521-547 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 中井満: "Fo生成モデルを用いたテンプレートに基づく連続音声の句境界検出" 電子情報通信学会論文誌DーII. J80-D-IINo.10. 2605-2614 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 中井満: "Fo信頼場を用いたFo制御機構の指令推定" 日本音響学会平成10年度春季研究発表会. (1998)

    • Related Report
      1997 Annual Research Report
  • [Publications] 川崎真護: "Fo生成モデルに基づくピッチパターン整合を用いた雑音重畳単語音声の認識" 日本音響学会平成10年度春季研究発表会. (1998)

    • Related Report
      1997 Annual Research Report
  • [Publications] Paul Taylor: "Using Prosodic Information to Improve Recognition Accuracy for Spoken Dialogue" Proc.of International Conference on Spoken Language Processing 96. 1. 216-219 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] 隈田章寛: "Fφ生成過程モデルの指令探索によるピッチパターンの再構成法" 日本音響学会春季研究発表会講演論文集. (1997)

    • Related Report
      1996 Annual Research Report
  • [Publications] 中井満: "Fo決定を要しないFoパターン整合を用いたアクセント句境界の自動推定" 日本音響学会春季研究発表会講演論文集. (1997)

    • Related Report
      1996 Annual Research Report
  • [Publications] 高倉健次: "Foテンプレートbigramを用いた韻律句の係り受け構造推定に関する検討" 日本音響学会春季研究発表会講演論文集. (1997)

    • Related Report
      1996 Annual Research Report
  • [Publications] Yoshinori Sagisaka: "Computing Prosody" Springer, 401 (1997)

    • Related Report
      1996 Annual Research Report

URL: 

Published: 1996-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi