Development of high-accuracy system for recognizing spontaneous speech

Research Project

Project/Area Number	22500144
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Yamagata University
Principal Investigator	KOSAKA Tetsuo 山形大学, 大学院・理工学研究科, 教授 (50359569)
Co-Investigator(Renkei-kenkyūsha)	KATO Masaharu 山形大学, 大学院・理工学研究科, 助教 (10250953)
Project Period (FY)	2010 – 2012
Project Status	Completed (Fiscal Year 2012)
Budget Amount *help	¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000) Fiscal Year 2012: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2011: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2010: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords	音声認識 / 話し言葉 / 音響モデル / 言語モデル / 話者適応 / 話し言葉音声認識 / 教師無し話者適応 / 単語グラフ統合 / クロスバリデーション / 話者インデキシング / 話者ベクトル / クロス適応 / 音素環境依存モデル / 話者クラス音響モデル
Research Abstract	In our research, we aimed to improve the system performance for recognizing spontaneousspeech, which was considered to be more difficult than recognizing read speech. We focused on three technical issues: (1) acoustic and language models, (2) system combinationtechniques, and (3) speaker indexing. For improving the performance of acoustic models,we investigated a discrete-mixture hidden Markov model based on discriminative training, speaker-class model, quinphone, and a reverberation-class model. Some systemco(a) mbinationtechniquesw(a) ere investigated, such as the combination of continuous anddiscrete models, the combination of various quinphones, and the combination of reverberation-class models. For the issues of language models, we proposed the cross adaptation and cross-validation adaptation techniques. In addition, we improved theperformance of speaker indexing techniques based on speaker vectors required during theexecution of speaker adaptation.

Report

(4 results)

2012 Annual Research Report Final Research Report ( PDF )
2011 Annual Research Report
2010 Annual Research Report

Research Products
(52 results)

All 2013 2012 2011 2010 Other

All Journal Article (22 results) (of which Peer Reviewed: 20 results) Presentation (25 results) Book (4 results) Remarks (1 results)

[Journal Article] A time-synchronous histogram equalization for noise robust speech recognition2013
- Author(s)
  Fumiya Takahashi, Masaharu Kato and Tetsuo Kosaka
- Journal Title
  
  Proc. of ICA
  
  Volume: 採録決定 Pages: 5-5
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] An investigation of vowel substitution rules in the automatic evaluation system of English pronunciation2013
- Author(s)
  Kei Sato, Masaharu Kato and Tetsuo Kosaka
- Journal Title
  
  Proc. of ICA
  
  Volume: 採録決定 Pages: 5-5
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] 識別学習を用いた離散混合分布 HMMによる音声認識2013
- Author(s)
  小坂哲夫,加藤正治
- Journal Title
  
  情報処理学会論文誌
  
  Volume: Vol. 54 No. 2 Pages: 436-442
- NAID
  110009537036
- URL
  https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&item_id=90262&item_no=1&page_id=13&block_id=8
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] 識別学習を用いた離散混合分布HMMによる音声認識2013
- Author(s)
  小坂哲夫，加藤正治
- Journal Title
  
  情報処理学会論文誌
  
  Volume: 54 Pages: 436-442
- NAID
  110009537036
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Journal Article] An investigation of vowel substitution rules in the automatic evaluation system of English pronunciation2013
- Author(s)
  Kei Sato, Masaharu Kato and Tetsuo Kosaka
- Journal Title
  
  Proc. of International Congress on Acoustics 2013
  
  Volume: 1 Pages: 1-5
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Journal Article] A time-synchronous histogram equalization for noise robust speech recognition2013
- Author(s)
  Fumiya Takahashi, Masaharu Kato and Tetsuo Kosaka
- Journal Title
  
  Proc. of International Congress on Acoustics 2013
  
  Volume: 1 Pages: 1-5
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Journal Article] Unsupervised Cross-Adaptation Approach for Speech Recognition by Combined Language Model and Acoustic Model Adaptation2011
- Author(s)
  Tetsuo Kosaka, Taro Miyamoto and Masaharu Kato
- Journal Title
  
  Proc. of APSIPA ASC 2011, Thu-PM
  
  Pages: 4-4
- URL
  http://www.apsipa.org/proceedings_2011/pdf/APSIPA177.pdf
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] Speaker Vector-Based Verification by Phonetic Class-Based Modeling2011
- Author(s)
  Tetsuo Kosaka, Naoki Tadokoro, Masaharu Kato and Masaki Kohda
- Journal Title
  
  Journal of Information Assurance and Security
  
  Volume: Vol. 6, No.3 Pages: 186-194
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] Lecture Speech Recognition Using Discrete-Mixture HMMs2011
- Author(s)
  Tetsuo Kosaka, Akiyoshi Yamamoto, Takuya Kumakura, Masaharu Kato and Masaki Kohda
- Journal Title
  
  IEEJ Transactions on Electrical and Electronic Engineering
  
  Volume: Vol. 6 No. 1 Issue: 1 Pages: 23-29
- DOI
  10.1002/tee.20602
- NAID
  10027629753
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] Unsupervised Cross-Adaptation Approach for Speech Recognition by Combined Language Model and Acoustic Model Adaptation2011
- Author(s)
  Tetsuo Kosaka, Taro Miyamoto, Masaharu Kato
- Journal Title
  
  Proc.of APSIPA ASC 2011
  
  Volume: (CD-ROM)
- Related Report
  2011 Annual Research Report
- Peer Reviewed
[Journal Article] Lecture Speech Recognition Using Discrete-Mixture HMMs2011
- Author(s)
  Tetsuo Kosaka, Akiyoshi Yamamoto, Takuya Kumakura, Masaharu Kato, Masaki Kohda
- Journal Title
  
  IEEJ Transactions on Electrical and Electromc Engineering
  
  Volume: Vol.6, No.1 Pages: 23-29
- NAID
  10027629753
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] Speaker Vector-Based Verification by Phonetic Class-Based Modeling2011
- Author(s)
  Tetsuo Kosaka, Naoki Tadokoro, Masaharu Kato, Masaki Kohda
- Journal Title
  
  Journal of Information Assurance and Security
  
  Volume: Vo1.6, No.3 Pages: 186-194
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] Performance Improvement in Automatic Evaluation System of English Pronunciation by Using Various Normalization Methods2010
- Author(s)
  Masaru Kusumi, Masaharu Kato, Tetsuo Kosaka and Itaru Matsunaga
- Journal Title
  
  Proc. of International Congress on Acoustics 2010
  
  Volume: 257 Pages: 6-6
- URL
  http://www.acoustics.asn.au/conference_proceedings/ICA2010/cdrom-ICA2010/papers/p257.pdf
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] Speech Recognition in Noise by Using Word Graph Combinations2010
- Author(s)
  Shunsuke Kuramata, Masaharu Kato and Tetsuo Kosaka
- Journal Title
  
  Proc. of International Congress on Acoustics 2010
  
  Volume: 341 Pages: 6-6
- URL
  http://www.acoustics.asn.au/conference_proceedings/ICA2010/cdrom-ICA2010/papers/p341.pdf
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] Speaker Adaptation Based on System Combination Using Speaker-Class Models2010
- Author(s)
  Tetsuo Kosaka, Takashi Ito, Masaharu Kato and Masaki Kohda
- Journal Title
  
  Proc. of Interspeech2010
  
  Pages: 546-549
- URL
  http://www.isca-speech.org/archive/interspeech_2010/i10_0546.html
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models2010
- Author(s)
  Tetsuo Kosaka, Keisuke Goto, Takashi Ito and Masaharu Kato
- Journal Title
  
  Proc. of Interspeech2010
  
  Pages: 2978-2981
- URL
  http://www.isca-speech.org/archive/interspeech_2010/i10_2978.html
- Related Report
  2012 Final Research Report
- Peer Reviewed
[Journal Article] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition2010
- Author(s)
  Tetsuo Kosaka, Yuui Takeda, Takashi Ito, Masaharu Kato, Masaki Kohda
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: Vo1.E93-D, No.9 Pages: 2363-2369
- NAID
  10027640196
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] Speech Recognition in Noise by Using Word Graph Combinations2010
- Author(s)
  Shunsuke Kuramata, MasaharuKato, Tetsuo Kosaka
- Journal Title
  
  Proc.of International Congress on Acoustics 2010
  
  Volume: CD-ROM
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] Speaker Adaptation Based on System Combination Using Speaker-Class Models2010
- Author(s)
  Tetsuo Kosaka, Takashi Ito, Masaharu Kato, Masaki Kohda
- Journal Title
  
  Proc.of Interspeech2010
  
  Volume: CD-ROM Pages: 546-549
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models2010
- Author(s)
  Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Kato
- Journal Title
  
  Proc.of Interspeech2010
  
  Volume: CD-ROM Pages: 2978-2981
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Journal Article] Quinphone HM-netを用いた単語グラフ統合に基づく講演音声認識2010
- Author(s)
  加藤正治, 小坂哲夫, 伊藤彰則, 牧野正三
- Journal Title
  
  電子情報通信学会技術研究報告
  
  Volume: SP2010-28 Pages: 37-42
- NAID
  110007969989
- Related Report
  2010 Annual Research Report
[Journal Article] 単語グラフ統合を用いた種々の雑音環境下での音声認識2010
- Author(s)
  倉又俊輔, 加藤正治, 小坂哲夫
- Journal Title
  
  電子情報通信学会技術研究報告
  
  Volume: SP2010-41 Pages: 37-42
- NAID
  110007890249
- Related Report
  2010 Annual Research Report
[Presentation] クロスバリデーションによる教師なし言語適応における各種パラメータの最適化2013
- Author(s)
  高木瑛,加藤正治,小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Year and Date
  2013-03-11
- Related Report
  2012 Final Research Report
[Presentation] 入力音声の韻律情報を用いたHMM音声合成2013
- Author(s)
  栗原大樹,加藤正治,小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Year and Date
  2013-03-11
- Related Report
  2012 Final Research Report
[Presentation] 話者クラス音響モデルを用いた講演音声認識におけるクラスタリング手法の各種検討2012
- Author(s)
  今野和樹,大山拓也,加藤正治,小坂哲夫
- Organizer
  音声言語情報処理研究報告
- Place of Presentation
  東京工業大学
- Year and Date
  2012-12-21
- Related Report
  2012 Annual Research Report 2012 Final Research Report
[Presentation] 日本人英語の自動発音評定における誤り規則の検討2012
- Author(s)
  佐藤慶,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  信州大学
- Year and Date
  2012-09-21
- Related Report
  2012 Annual Research Report 2012 Final Research Report
[Presentation] 雑音下音声認識におけるフレーム重みづけヒストグラム同等化法の検討2012
- Author(s)
  高橋郁也,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  信州大学
- Year and Date
  2012-09-19
- Related Report
  2012 Annual Research Report 2012 Final Research Report
[Presentation] 単語グラフ統合を用いた残響下音声認識の検討2012
- Author(s)
  倉又俊輔,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  神奈川大学横浜キャンパス
- Year and Date
  2012-03-13
- Related Report
  2012 Final Research Report
[Presentation] 単語グラフ統合を用いた残響下音声認識の検討2012
- Author(s)
  倉又俊輔, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  神奈川大学
- Year and Date
  2012-03-13
- Related Report
  2011 Annual Research Report
[Presentation] 教師なし話者適応における各種パラメータの最適化2012
- Author(s)
  今野聡介,加藤正治,小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Year and Date
  2012-03-09
- Related Report
  2012 Final Research Report
[Presentation] 自動発音評定における母音置換規則の検討2012
- Author(s)
  佐藤慶,加藤正治,小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Year and Date
  2012-03-09
- Related Report
  2012 Final Research Report
[Presentation] 雑音下音声認識におけるヒストグラム同等化法の改良2012
- Author(s)
  高橋郁也,加藤正治,小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Year and Date
  2012-03-09
- Related Report
  2012 Final Research Report
[Presentation] 教師なし話者適応における各種パラメータの最適化2012
- Author(s)
  今野聡介, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学
- Year and Date
  2012-03-09
- Related Report
  2011 Annual Research Report
[Presentation] 自動発音評定における母音置換規則の検討2012
- Author(s)
  佐藤慶, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学
- Year and Date
  2012-03-09
- Related Report
  2011 Annual Research Report
[Presentation] 雑音下音声認識におけるヒストグラム同等化法の改良2012
- Author(s)
  高橋郁也, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学
- Year and Date
  2012-03-09
- Related Report
  2011 Annual Research Report
[Presentation] 少量のデータによるヒストグラム同等化法の検討2011
- Author(s)
  湊竜一,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  島根大学松江キャンパス
- Year and Date
  2011-09-20
- Related Report
  2012 Final Research Report
[Presentation] 少量のデータによるヒストグラム同等化法の検討2011
- Author(s)
  湊竜一, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  島根大学
- Year and Date
  2011-09-20
- Related Report
  2011 Annual Research Report
[Presentation] 教師なし音響・言語モデル適応の性能改善2011
- Author(s)
  宮本太郎,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  早稲田大学
- Year and Date
  2011-03-10
- Related Report
  2012 Final Research Report 2010 Annual Research Report
[Presentation] 日本人英語の自動発音評定における精度向上の検討2011
- Author(s)
  久住大,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  早稲田大学
- Year and Date
  2011-03-10
- Related Report
  2012 Final Research Report 2010 Annual Research Report
[Presentation] 日本人英語と米国人英語の音素モデル間距離の検討2010
- Author(s)
  久住大,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  関西大学千里山キャンパス
- Year and Date
  2010-09-16
- Related Report
  2012 Final Research Report
[Presentation] 日本人英語と米国人英語の音素モデル間距離の検討2010
- Author(s)
  久住大, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  関西大学
- Year and Date
  2010-09-16
- Related Report
  2010 Annual Research Report
[Presentation] Quinphone HM-Netに基づく講演音声認識2010
- Author(s)
  加藤正治,小坂哲夫,伊藤彰則,牧野正三
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  関西大学千里山キャンパス
- Year and Date
  2010-09-14
- Related Report
  2012 Final Research Report
[Presentation] Quinphone HM-Netに基づく講演音声認識2010
- Author(s)
  加藤正治, 小坂哲夫, 伊藤彰則, 牧野正三
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  関西大学
- Year and Date
  2010-09-14
- Related Report
  2010 Annual Research Report
[Presentation] 単語グラフ統合を用いた種々の雑音環境下での音声認識2010
- Author(s)
  倉又俊輔,加藤正治,小坂哲夫
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  仙台市秋保温泉
- Year and Date
  2010-07-23
- Related Report
  2012 Final Research Report
[Presentation] Quinphone HM-netを用いた単語グラフ統合に基づく講演音声認識2010
- Author(s)
  加藤正治,小坂哲夫,伊藤彰則,牧野正三
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  九州大学筑紫キャンパス
- Year and Date
  2010-06-18
- Related Report
  2012 Final Research Report
[Presentation] 入力音声の韻律情報を用いたHMM音声合成
- Author(s)
  栗原大樹, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Related Report
  2012 Annual Research Report
[Presentation] クロスバリデーションによる教師なし言語適応における各種パラメータの最適化
- Author(s)
  高木瑛, 加藤正治, 小坂哲夫
- Organizer
  情報処理学会東北支部研究会
- Place of Presentation
  山形大学工学部
- Related Report
  2012 Annual Research Report
[Book] 電子情報通信学会知識ベース ,2群画像・音・言語,7編音声認識と合成 ,「2-4話者・環境適応」(原島博, 他編)2011
- Author(s)
  小坂哲夫
- Total Pages
  3
- Publisher
  電子情報通信学会
- Related Report
  2012 Final Research Report
[Book] "Improvement of Lecture Speech Recognition by Using Unsupervised Adaptation," E-Activity and IntelligentWeb Construction: Effects of Social Design2011
- Author(s)
  Tetsuo Kosaka, Takashi Kusama, Masaharu Kato and Masaki Kohda(T.Matsuo and T.Fujimoto ed.)
- Publisher
  Information Science Reference
- Related Report
  2012 Final Research Report
[Book] E-Activity and Intelligent Web Construction, "Improvement of Lecture Speech Recognition by Using Unsupervised Adaptation"(16章)2011
- Author(s)
  T.Matsuo, 他編
- Publisher
  IGI Global
- Related Report
  2011 Annual Research Report
[Book] 電子情報通信学会知識ベース, 群画像・音・言語, 7編音声認識と合成, 「2-4話者・環境適応」, 小坂哲夫(執筆担当)2011
- Author(s)
  原島博, 他編
- Total Pages
  4
- Publisher
  電子情報通信学会
- Related Report
  2010 Annual Research Report
[Remarks] 小坂研究室
- URL
  http://eieweb.yz.yamagata-u.ac.jp/~kosaka/
- Related Report
  2012 Annual Research Report

Development of high-accuracy system for recognizing spontaneous speech

Principal Investigator

KOSAKA Tetsuo 山形大学, 大学院・理工学研究科, 教授 (50359569)

¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)

Report

Research Products

[Journal Article] A time-synchronous histogram equalization for noise robust speech recognition2013

Author(s)

Journal Title

Related Report

[Journal Article] An investigation of vowel substitution rules in the automatic evaluation system of English pronunciation2013

Author(s)

Journal Title

Related Report

[Journal Article] 識別学習を用いた離散混合分布 HMMによる音声認識2013

Author(s)

Journal Title

NAID

URL

Related Report

[Journal Article] 識別学習を用いた離散混合分布HMMによる音声認識2013

Author(s)

Journal Title

NAID

Related Report

[Journal Article] An investigation of vowel substitution rules in the automatic evaluation system of English pronunciation2013

Author(s)

Journal Title

Related Report

[Journal Article] A time-synchronous histogram equalization for noise robust speech recognition2013

Author(s)

Journal Title

Related Report

[Journal Article] Unsupervised Cross-Adaptation Approach for Speech Recognition by Combined Language Model and Acoustic Model Adaptation2011

Author(s)

Journal Title

URL

Related Report

[Journal Article] Speaker Vector-Based Verification by Phonetic Class-Based Modeling2011

Author(s)

Journal Title

Related Report

[Journal Article] Lecture Speech Recognition Using Discrete-Mixture HMMs2011

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] Unsupervised Cross-Adaptation Approach for Speech Recognition by Combined Language Model and Acoustic Model Adaptation2011

Author(s)

Journal Title

Related Report

[Journal Article] Lecture Speech Recognition Using Discrete-Mixture HMMs2011

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Speaker Vector-Based Verification by Phonetic Class-Based Modeling2011

Author(s)

Journal Title

Related Report

[Journal Article] Performance Improvement in Automatic Evaluation System of English Pronunciation by Using Various Normalization Methods2010

Author(s)

Journal Title

URL

Related Report

[Journal Article] Speech Recognition in Noise by Using Word Graph Combinations2010

Author(s)

Journal Title

URL

Related Report

[Journal Article] Speaker Adaptation Based on System Combination Using Speaker-Class Models2010

Author(s)

Journal Title

URL

Related Report

[Journal Article] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models2010

Author(s)

Journal Title

URL