• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Continuous speech recognition with adaptabilty to the speaking rate of an input speech

Research Project

Project/Area Number 07458064
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionTohoku University

Principal Investigator

MAKINO Shozo  Tohoku Univ., Computer Center, Prof., 大型計算機センター, 教授 (00089806)

Co-Investigator(Kenkyū-buntansha) SUZUKI Motoyuki  Tohoku Univ., Computer Center, Research Associ., 大型計算機センター, 助手 (30282015)
SONE Hideaki  Tohoku Univ.Graduate School of Information Sceiences Assosci.Prof., 情報科学研究科, 助教授 (40134019)
伊藤 彰則  山形大学, 工学部, 講師 (70232428)
安倍 正人  東北大学, 大型計算機センター, 助教授 (00159443)
Project Period (FY) 1995 – 1997
Project Status Completed (Fiscal Year 1997)
Budget Amount *help
¥6,400,000 (Direct Cost: ¥6,400,000)
Fiscal Year 1997: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 1996: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 1995: ¥4,800,000 (Direct Cost: ¥4,800,000)
Keywordscontinuous speech recognition / phoneme recognition / speaking rate / speakaer adaptation / 発声速度 / 持続時間 / 予備認識
Research Abstract

This tesearch developed a spoken word recognition system which used phoneme duration information estimated from the speaking rate of an input speech. In this research, the speaking rate is assumed to be reflected to the average vowel length. The acoustic processor transforms the input speech into a similarity matrix using the modified LVQ2. The average vowel length is computed from the preliminary recognition result. The duration of each phoneme in each word template is estimated from the average length of vowels in the input speech. By taking into account the estimated phoneme duration, the spoken word recognition experiments were carried out using the DTW.The word recognition score was 97.3% for the 212 word vocabulary uttered by 5 male speakers (test set). The phoneme duration information is collected from the 212 word vocabulary uttered by another 5 male and 10 female speakers (training set). The hybrid combination of the prceiding phoneme dependent estimation and the follwoing phoneme dependent estimation gave the best performance.
The above-mentioned method was extended to phoneme recognition. The phoneme accuracy increased from 71.8% to 86.3% for phonemes in the 212 word vocabulary uttered by 5 male speakers (test set).

Report

(4 results)
  • 1997 Annual Research Report   Final Research Report Summary
  • 1996 Annual Research Report
  • 1995 Annual Research Report
  • Research Products

    (22 results)

All Other

All Publications (22 results)

  • [Publications] M.SUZUKI, S.MAKINO et al.: "A New HMnet Constrution Algorithm Requiring No Contextual Factors" IEICE Trans.on Information and Systems. E78-D, 6. 662-668 (1995)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] H.MORI, H.ASO, S.MAKINO: "Robust n-gram Model of Japanese Character and its Application to Document Recognition" IEICE Trans.on Information and Systems. E79-D, 5. 471-476 (1996)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] Y.Okimoto, S.Makino: "Phoneme recogniton using reference patterns constructed with discriminative traning and DP matching" Jour.Acoust.Soc.America. 100, 4. 2791-2791 (1996)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] M.SUZUKI,S.MAKINO,A.ITO,H.ASO,H.SHIMODAIRA: "A New HMnet Construction Algorithm Requiring No Contextual Factors" IEICE Trans.on Information and Systems. E78-D,6. 662-668 (1995)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] H.MORI,H.ASO,S.MAKINO: "Robust n-gram Model of Japanese Character and its application to Document Recognition" IEICE Trans.on Information and Systems. E79-D,5. 471-476 (1996)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] Y.Okimoto, S.Makino: "Phoneme recognition using reference patterns constructed with discriminative training and DP matching." Jour.Acoust.Soc.America. 100,4. 2791-2791 (1996)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1997 Final Research Report Summary
  • [Publications] S.MAKIKO, M.SUZUKI, A.HARADA: "Automatic Acquistion of Language Model using HMnet" Proc.Int.Conf Speech Processing'97. I. 47-54 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 原田, 鈴木, 牧野: "離散型HMnetによる新聞記事からの文節モデルの獲得" 電子情報通信学会技術報告. SP97・24. 45-50 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 阿部, 鈴木, 牧野, 阿曽: "音素毎の話者クラスタリングに基づく話者適応法" 電子情報通信学会技術報告. SP97・74. 41-46 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 森, 阿曽, 牧野: "再現性を考慮した文字列に基づく統計的言語モデル" 電子情報通信学会技術報告. NLC97・47. 29-34 (1997)

    • Related Report
      1997 Annual Research Report
  • [Publications] 鈴木,阿曽,牧野: "SSS-freeに基づくHMnetを用いた不特定話者音素認識" 日本音響学会講演論文集. 春季号. 143-144 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] 大坂,牧野: "発声速度に基づく音素持続時間予測を用いた音素認識" 信学技報. Vol. 96 No. 93. 1-6 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] 沖本,牧野: "可変長パターンと識別学習を用いた音素認識" 信学技報. Vol. 96 No. 93. 7-14 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] Y. Okimoto, S. Makino: "Phoneme Recognition using reference patterns constructed with discriminative training and DP matching" THE JOURNAL of the Acoustical Society of America. Vol. 100 No. 4. 2757-2757 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] M. Suzuki, S. Makino: "Acquisition of language models based on HMnet" THE JOURNAL of the Acoustical Society of America. Vol. 100 No. 4. 2791-2791 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] 牧野 正三: "東北大一松下単語音声データベース" 人文学と情報処理. 第12号. 56-59 (1996)

    • Related Report
      1996 Annual Research Report
  • [Publications] 古賀,牧野,城戸: "ローカルピークによる単母音認識に及ぼす時間窓とリフタの影響" 日本音響学会誌. 51. 130-132 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 伊藤,牧野: "拡張PHA法による連続音声認識のための単語予備選択" 電子情報通信学会論文誌D-II. J-78-D-II. 400-408 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] M、SUZUKI,S.MAKINO,H、ASO,H、SHIMODAIRA: "A New HM net Construction Algorithm Requiniag No Contextual Factors" IEICE Trens.INF,& SYST.E-78-D. 662-668 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 鈴木,牧野,阿曽: "離散型HMnetの言語モデルへの適用" 電子情報通信学会技術研究報告. SP95-33. 65-72 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 沖本,牧野,曽根: "確率尺度によるDPマッチングを用いた音素のセグメンテーション" 日本音響学会講演論文集. I. 165-166 (1995)

    • Related Report
      1995 Annual Research Report
  • [Publications] 大坂,牧野,曽根: "予備認識結果に基づく持続時間予測の音素認識における効果" 日本音響学会講演論文集. I. 55-56 (1995)

    • Related Report
      1995 Annual Research Report

URL: 

Published: 1995-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi