Continuous speech recognition with adaptabilty to the speaking rate of an input speech

Research Project

Project/Area Number	07458064
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Tohoku University
Principal Investigator	MAKINO Shozo Tohoku Univ., Computer Center, Prof., 大型計算機センター, 教授 (00089806)
Co-Investigator(Kenkyū-buntansha)	SUZUKI Motoyuki Tohoku Univ., Computer Center, Research Associ., 大型計算機センター, 助手 (30282015) SONE Hideaki Tohoku Univ.Graduate School of Information Sceiences Assosci.Prof., 情報科学研究科, 助教授 (40134019) 伊藤彰則山形大学, 工学部, 講師 (70232428) 安倍正人東北大学, 大型計算機センター, 助教授 (00159443)
Project Period (FY)	1995 – 1997
Project Status	Completed (Fiscal Year 1997)
Budget Amount *help	¥6,400,000 (Direct Cost: ¥6,400,000) Fiscal Year 1997: ¥900,000 (Direct Cost: ¥900,000) Fiscal Year 1996: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 1995: ¥4,800,000 (Direct Cost: ¥4,800,000)
Keywords	continuous speech recognition / phoneme recognition / speaking rate / speakaer adaptation / 発声速度 / 持続時間 / 予備認識
Research Abstract	This tesearch developed a spoken word recognition system which used phoneme duration information estimated from the speaking rate of an input speech. In this research, the speaking rate is assumed to be reflected to the average vowel length. The acoustic processor transforms the input speech into a similarity matrix using the modified LVQ2. The average vowel length is computed from the preliminary recognition result. The duration of each phoneme in each word template is estimated from the average length of vowels in the input speech. By taking into account the estimated phoneme duration, the spoken word recognition experiments were carried out using the DTW.The word recognition score was 97.3% for the 212 word vocabulary uttered by 5 male speakers (test set). The phoneme duration information is collected from the 212 word vocabulary uttered by another 5 male and 10 female speakers (training set). The hybrid combination of the prceiding phoneme dependent estimation and the follwoing phoneme dependent estimation gave the best performance. The above-mentioned method was extended to phoneme recognition. The phoneme accuracy increased from 71.8% to 86.3% for phonemes in the 212 word vocabulary uttered by 5 male speakers (test set).

Report

(4 results)

1997 Annual Research Report Final Research Report Summary
1996 Annual Research Report
1995 Annual Research Report

Research Products
(22 results)

All Other

All Publications (22 results)

[Publications] M.SUZUKI, S.MAKINO et al.: "A New HMnet Constrution Algorithm Requiring No Contextual Factors" IEICE Trans.on Information and Systems. E78-D, 6. 662-668 (1995)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] H.MORI, H.ASO, S.MAKINO: "Robust n-gram Model of Japanese Character and its Application to Document Recognition" IEICE Trans.on Information and Systems. E79-D, 5. 471-476 (1996)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] Y.Okimoto, S.Makino: "Phoneme recogniton using reference patterns constructed with discriminative traning and DP matching" Jour.Acoust.Soc.America. 100, 4. 2791-2791 (1996)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] M.SUZUKI,S.MAKINO,A.ITO,H.ASO,H.SHIMODAIRA: "A New HMnet Construction Algorithm Requiring No Contextual Factors" IEICE Trans.on Information and Systems. E78-D,6. 662-668 (1995)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] H.MORI,H.ASO,S.MAKINO: "Robust n-gram Model of Japanese Character and its application to Document Recognition" IEICE Trans.on Information and Systems. E79-D,5. 471-476 (1996)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] Y.Okimoto, S.Makino: "Phoneme recognition using reference patterns constructed with discriminative training and DP matching." Jour.Acoust.Soc.America. 100,4. 2791-2791 (1996)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1997 Final Research Report Summary
[Publications] S.MAKIKO, M.SUZUKI, A.HARADA: "Automatic Acquistion of Language Model using HMnet" Proc.Int.Conf Speech Processing'97. I. 47-54 (1997)
- Related Report
  1997 Annual Research Report
[Publications] 原田, 鈴木, 牧野: "離散型HMnetによる新聞記事からの文節モデルの獲得" 電子情報通信学会技術報告. SP97・24. 45-50 (1997)
- Related Report
  1997 Annual Research Report
[Publications] 阿部, 鈴木, 牧野, 阿曽: "音素毎の話者クラスタリングに基づく話者適応法" 電子情報通信学会技術報告. SP97・74. 41-46 (1997)
- Related Report
  1997 Annual Research Report
[Publications] 森, 阿曽, 牧野: "再現性を考慮した文字列に基づく統計的言語モデル" 電子情報通信学会技術報告. NLC97・47. 29-34 (1997)
- Related Report
  1997 Annual Research Report
[Publications] 鈴木,阿曽,牧野: "SSS-freeに基づくHMnetを用いた不特定話者音素認識" 日本音響学会講演論文集. 春季号. 143-144 (1996)
- Related Report
  1996 Annual Research Report
[Publications] 大坂,牧野: "発声速度に基づく音素持続時間予測を用いた音素認識" 信学技報. Vol. 96 No. 93. 1-6 (1996)
- Related Report
  1996 Annual Research Report
[Publications] 沖本,牧野: "可変長パターンと識別学習を用いた音素認識" 信学技報. Vol. 96 No. 93. 7-14 (1996)
- Related Report
  1996 Annual Research Report
[Publications] Y. Okimoto, S. Makino: "Phoneme Recognition using reference patterns constructed with discriminative training and DP matching" THE JOURNAL of the Acoustical Society of America. Vol. 100 No. 4. 2757-2757 (1996)
- Related Report
  1996 Annual Research Report
[Publications] M. Suzuki, S. Makino: "Acquisition of language models based on HMnet" THE JOURNAL of the Acoustical Society of America. Vol. 100 No. 4. 2791-2791 (1996)
- Related Report
  1996 Annual Research Report
[Publications] 牧野正三: "東北大一松下単語音声データベース" 人文学と情報処理. 第12号. 56-59 (1996)
- Related Report
  1996 Annual Research Report
[Publications] 古賀,牧野,城戸: "ローカルピークによる単母音認識に及ぼす時間窓とリフタの影響" 日本音響学会誌. 51. 130-132 (1995)
- Related Report
  1995 Annual Research Report
[Publications] 伊藤,牧野: "拡張PHA法による連続音声認識のための単語予備選択" 電子情報通信学会論文誌D-II. J-78-D-II. 400-408 (1995)
- Related Report
  1995 Annual Research Report
[Publications] M、SUZUKI,S.MAKINO,H、ASO,H、SHIMODAIRA: "A New HM net Construction Algorithm Requiniag No Contextual Factors" IEICE Trens.INF,& SYST.E-78-D. 662-668 (1995)
- Related Report
  1995 Annual Research Report
[Publications] 鈴木,牧野,阿曽: "離散型HMnetの言語モデルへの適用" 電子情報通信学会技術研究報告. SP95-33. 65-72 (1995)
- Related Report
  1995 Annual Research Report
[Publications] 沖本,牧野,曽根: "確率尺度によるDPマッチングを用いた音素のセグメンテーション" 日本音響学会講演論文集. I. 165-166 (1995)
- Related Report
  1995 Annual Research Report
[Publications] 大坂,牧野,曽根: "予備認識結果に基づく持続時間予測の音素認識における効果" 日本音響学会講演論文集. I. 55-56 (1995)
- Related Report
  1995 Annual Research Report

Continuous speech recognition with adaptabilty to the speaking rate of an input speech

Principal Investigator

MAKINO Shozo Tohoku Univ., Computer Center, Prof., 大型計算機センター, 教授 (00089806)

¥6,400,000 (Direct Cost: ¥6,400,000)

Report

Research Products

[Publications] M.SUZUKI, S.MAKINO et al.: "A New HMnet Constrution Algorithm Requiring No Contextual Factors" IEICE Trans.on Information and Systems. E78-D, 6. 662-668 (1995)

Description

Related Report

[Publications] H.MORI, H.ASO, S.MAKINO: "Robust n-gram Model of Japanese Character and its Application to Document Recognition" IEICE Trans.on Information and Systems. E79-D, 5. 471-476 (1996)

Description

Related Report

[Publications] Y.Okimoto, S.Makino: "Phoneme recogniton using reference patterns constructed with discriminative traning and DP matching" Jour.Acoust.Soc.America. 100, 4. 2791-2791 (1996)

Description

Related Report

[Publications] M.SUZUKI,S.MAKINO,A.ITO,H.ASO,H.SHIMODAIRA: "A New HMnet Construction Algorithm Requiring No Contextual Factors" IEICE Trans.on Information and Systems. E78-D,6. 662-668 (1995)

Description

Related Report

[Publications] H.MORI,H.ASO,S.MAKINO: "Robust n-gram Model of Japanese Character and its application to Document Recognition" IEICE Trans.on Information and Systems. E79-D,5. 471-476 (1996)

Description

Related Report

[Publications] Y.Okimoto, S.Makino: "Phoneme recognition using reference patterns constructed with discriminative training and DP matching." Jour.Acoust.Soc.America. 100,4. 2791-2791 (1996)

Description

Related Report

[Publications] S.MAKIKO, M.SUZUKI, A.HARADA: "Automatic Acquistion of Language Model using HMnet" Proc.Int.Conf Speech Processing'97. I. 47-54 (1997)

Related Report

[Publications] 原田, 鈴木, 牧野: "離散型HMnetによる新聞記事からの文節モデルの獲得" 電子情報通信学会技術報告. SP97・24. 45-50 (1997)

Related Report

[Publications] 阿部, 鈴木, 牧野, 阿曽: "音素毎の話者クラスタリングに基づく話者適応法" 電子情報通信学会技術報告. SP97・74. 41-46 (1997)

Related Report

[Publications] 森, 阿曽, 牧野: "再現性を考慮した文字列に基づく統計的言語モデル" 電子情報通信学会技術報告. NLC97・47. 29-34 (1997)

Related Report

[Publications] 鈴木,阿曽,牧野: "SSS-freeに基づくHMnetを用いた不特定話者音素認識" 日本音響学会講演論文集. 春季号. 143-144 (1996)

Related Report

[Publications] 大坂,牧野: "発声速度に基づく音素持続時間予測を用いた音素認識" 信学技報. Vol. 96 No. 93. 1-6 (1996)

Related Report

[Publications] 沖本,牧野: "可変長パターンと識別学習を用いた音素認識" 信学技報. Vol. 96 No. 93. 7-14 (1996)

Related Report

[Publications] Y. Okimoto, S. Makino: "Phoneme Recognition using reference patterns constructed with discriminative training and DP matching" THE JOURNAL of the Acoustical Society of America. Vol. 100 No. 4. 2757-2757 (1996)

Related Report

[Publications] M. Suzuki, S. Makino: "Acquisition of language models based on HMnet" THE JOURNAL of the Acoustical Society of America. Vol. 100 No. 4. 2791-2791 (1996)

Related Report

[Publications] 牧野 正三: "東北大一松下単語音声データベース" 人文学と情報処理. 第12号. 56-59 (1996)

Related Report

[Publications] 古賀,牧野,城戸: "ローカルピークによる単母音認識に及ぼす時間窓とリフタの影響" 日本音響学会誌. 51. 130-132 (1995)

Related Report

[Publications] 伊藤,牧野: "拡張PHA法による連続音声認識のための単語予備選択" 電子情報通信学会論文誌D-II. J-78-D-II. 400-408 (1995)

Related Report

[Publications] M、SUZUKI,S.MAKINO,H、ASO,H、SHIMODAIRA: "A New HM net Construction Algorithm Requiniag No Contextual Factors" IEICE Trens.INF,& SYST.E-78-D. 662-668 (1995)

Related Report

[Publications] 鈴木,牧野,阿曽: "離散型HMnetの言語モデルへの適用" 電子情報通信学会技術研究報告. SP95-33. 65-72 (1995)

Related Report

[Publications] 沖本,牧野,曽根: "確率尺度によるDPマッチングを用いた音素のセグメンテーション" 日本音響学会講演論文集. I. 165-166 (1995)

Related Report

[Publications] 大坂,牧野,曽根: "予備認識結果に基づく持続時間予測の音素認識における効果" 日本音響学会講演論文集. I. 55-56 (1995)

Related Report

[Publications] 牧野正三: "東北大一松下単語音声データベース" 人文学と情報処理. 第12号. 56-59 (1996)