2010 Fiscal Year Annual Research Report

高精度な話し言葉認識技術の開発

Research Project

Project/Area Number	22500144
Research Institution	Yamagata University
Principal Investigator	小坂哲夫山形大学, 大学院・理工学研究科, 准教授 (50359569)
Keywords	話し言葉音声認識 / 音響モデル / 言語モデル / 話者インデキシング / 音素環境依存モデル / クロス適応 / 話者クラス音響モデル / 話者ベクトル
Research Abstract	1.音響モデルの精密化の検討:音響モデルを精密化する手法として音素環境依存モデルが用いられている.一般的にtriphoneが用いられるが,前後2音素の違いを考慮するquinphoneにより更なる性能向上が得られる.しかしquinphoneを用いる場合,発話速度の違いなど発話それぞれについて最適な状態数が異なる.これに対し単語グラフ統合を用いることにより状態数の自動最適化を図る手法を提案した.また話者性の問題に対し,話者クラス音響モデルを利用することにより認識性能が向上することを示した 2.言語モデルの高精度化:言語モデルの高精度化を図る場合,学習テキストの量を増加することが有効であり,そのための一手法としてWeb上のテキストの利用が考えられる.どのようなWebテキストを選択することが話し言葉認識に有効かの検討を行った.また言語モデルをタスクに適応することにより精度が向上するが,その場合クロス適応の手法を用いると性能向上が得られることが分かった 3.話者インデキシングの検討:話者ベクトルに基づく話者インデキシングの検討を行った.話者ベクトルを生成する場合の音響モデルとして,音素を考慮したモデル化が有効であることを示した.また雑音が混入する場合,話者ベクトルの軸として雑音を表現する軸を追加することが有効であることが分かった.以上1,2は音声認識自体の性能向上に寄与する.また会議音声など複数話者が発声している状況において話者適応を行う場合,話者の分類が必要である.3はこのための必須技術であり,インデキシングの性能向上が話者適応の性能向上に繋がると考えられる.

Research Products
(13 results)

All 2011 2010

All Journal Article (8 results) (of which Peer Reviewed: 6 results) Presentation (4 results) Book (1 results)

[Journal Article] Lecture Speech Recognition Using Discrete-Mixture HMMs2011
- Author(s)
  Tetsuo Kosaka, Akiyoshi Yamamoto, Takuya Kumakura, Masaharu Kato, Masaki Kohda
- Journal Title
  
  IEEJ Transactions on Electrical and Electromc Engineering
  
  Volume: Vol.6, No.1 Pages: 23-29
- Peer Reviewed
[Journal Article] Speaker Vector-Based Verification by Phonetic Class-Based Modeling2011
- Author(s)
  Tetsuo Kosaka, Naoki Tadokoro, Masaharu Kato, Masaki Kohda
- Journal Title
  
  Journal of Information Assurance and Security
  
  Volume: Vo1.6, No.3 Pages: 186-194
- Peer Reviewed
[Journal Article] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition2010
- Author(s)
  Tetsuo Kosaka, Yuui Takeda, Takashi Ito, Masaharu Kato, Masaki Kohda
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: Vo1.E93-D, No.9 Pages: 2363-2369
- Peer Reviewed
[Journal Article] Speech Recognition in Noise by Using Word Graph Combinations2010
- Author(s)
  Shunsuke Kuramata, MasaharuKato, Tetsuo Kosaka
- Journal Title
  
  Proc.of International Congress on Acoustics 2010
  
  Volume: CD-ROM
- Peer Reviewed
[Journal Article] Speaker Adaptation Based on System Combination Using Speaker-Class Models2010
- Author(s)
  Tetsuo Kosaka, Takashi Ito, Masaharu Kato, Masaki Kohda
- Journal Title
  
  Proc.of Interspeech2010
  
  Volume: CD-ROM Pages: 546-549
- Peer Reviewed
[Journal Article] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models2010
- Author(s)
  Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Kato
- Journal Title
  
  Proc.of Interspeech2010
  
  Volume: CD-ROM Pages: 2978-2981
- Peer Reviewed
[Journal Article] Quinphone HM-netを用いた単語グラフ統合に基づく講演音声認識2010
- Author(s)
  加藤正治, 小坂哲夫, 伊藤彰則, 牧野正三
- Journal Title
  
  電子情報通信学会技術研究報告
  
  Volume: SP2010-28 Pages: 37-42
[Journal Article] 単語グラフ統合を用いた種々の雑音環境下での音声認識2010
- Author(s)
  倉又俊輔, 加藤正治, 小坂哲夫
- Journal Title
  
  電子情報通信学会技術研究報告
  
  Volume: SP2010-41 Pages: 37-42
[Presentation] 教師なし音響・言語モデル適応の性能改善2011
- Author(s)
  宮本太郎,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  早稲田大学
- Year and Date
  2011-03-10
[Presentation] 日本人英語の自動発音評定における精度向上の検討2011
- Author(s)
  久住大,加藤正治,小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  早稲田大学
- Year and Date
  2011-03-10
[Presentation] 日本人英語と米国人英語の音素モデル間距離の検討2010
- Author(s)
  久住大, 加藤正治, 小坂哲夫
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  関西大学
- Year and Date
  2010-09-16
[Presentation] Quinphone HM-Netに基づく講演音声認識2010
- Author(s)
  加藤正治, 小坂哲夫, 伊藤彰則, 牧野正三
- Organizer
  日本音響学会講演論文集
- Place of Presentation
  関西大学
- Year and Date
  2010-09-14
[Book] 電子情報通信学会知識ベース, 群画像・音・言語, 7編音声認識と合成, 「2-4話者・環境適応」, 小坂哲夫(執筆担当)2011
- Author(s)
  原島博, 他編
- Total Pages
  4
- Publisher
  電子情報通信学会

2010 Fiscal Year Annual Research Report

高精度な話し言葉認識技術の開発

Principal Investigator

小坂 哲夫 山形大学, 大学院・理工学研究科, 准教授 (50359569)

Research Products

[Journal Article] Lecture Speech Recognition Using Discrete-Mixture HMMs2011

Author(s)

Journal Title

[Journal Article] Speaker Vector-Based Verification by Phonetic Class-Based Modeling2011

Author(s)

Journal Title

[Journal Article] Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition2010

Author(s)

Journal Title

[Journal Article] Speech Recognition in Noise by Using Word Graph Combinations2010

Author(s)

Journal Title

[Journal Article] Speaker Adaptation Based on System Combination Using Speaker-Class Models2010

Author(s)

Journal Title

[Journal Article] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models2010

Author(s)

Journal Title

[Journal Article] Quinphone HM-netを用いた単語グラフ統合に基づく講演音声認識2010

Author(s)

Journal Title

[Journal Article] 単語グラフ統合を用いた種々の雑音環境下での音声認識2010

Author(s)

Journal Title

[Presentation] 教師なし音響・言語モデル適応の性能改善2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 日本人英語の自動発音評定における精度向上の検討2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 日本人英語と米国人英語の音素モデル間距離の検討2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Quinphone HM-Netに基づく講演音声認識2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Book] 電子情報通信学会知識ベース, 群画像・音・言語, 7編音声認識と合成, 「2-4話者・環境適応」, 小坂哲夫(執筆担当)2011

Author(s)

Total Pages

Publisher

小坂哲夫山形大学, 大学院・理工学研究科, 准教授 (50359569)