構造不変の定理に基づく音声アフォーダンスの提案とそれに立脚した音声認識系の構築

Research Project

Project/Area Number	19024023
Research Category	Grant-in-Aid for Scientific Research on Priority Areas
Allocation Type	Single-year Grants
Review Section	Science and Engineering
Research Institution	The University of Tokyo
Principal Investigator	峯松信明 The University of Tokyo, 大学院・工学系研究科, 准教授 (90273333)
Project Period (FY)	2007 – 2008
Project Status	Completed (Fiscal Year 2008)
Budget Amount *help	¥7,500,000 (Direct Cost: ¥7,500,000) Fiscal Year 2008: ¥3,700,000 (Direct Cost: ¥3,700,000) Fiscal Year 2007: ¥3,800,000 (Direct Cost: ¥3,800,000)
Keywords	構造不変の定理 / f-divergence / 音声の構造的表象 / 音声認識 / 変換関数とその推定法 / CALLシステム / 分布間距離 / 次元分割 / 識別学習 / 発音教育支援
Research Abstract	本研究では, 線形・非線形を問わず, あらゆる可逆な変換・写像に対して不変な特徴量であるバタチャリヤ距離を用いた音声認識系について研究を行なった。主な成果は4つある。一つは1)不変量の一般式を導出したこと。即ち, 不変量はf-divergenceでなければならないことを数学的に証明したことである。二つ目は2)話者性による音声の違いを変換・写像として捉えた場合の, その写像関数の推定方法として現在広く使われているGMM法の欠点を明確にし, それを解決する新しい写像推定法を提案したこと, 3)f-divergenceに基づく表象は, 一般に強すぎる不変性を持つ。これは, 対象とする変換群にのみ不変性を示す表象技術を構築する必要があることを意味するが, 部分空間への分割, 及び部分空間での構造化を通してこの問題を解決したこと, 4)更には, 実用アプリケーションとして, 外国語発音評価システムを構築したことである。以下, 各々についてより詳細に示す。バタチャリヤ距離が任意の可逆かつ連続的な変換に対しても不変であることを既に証明されていたが, 本研究では, バタチャリヤ距離の一般形である, f-divergenceも不変性を満たし, また, 不変な尺度はf-divergenceでなければならないという必要性までも証明することに成功した。f-divergenceはバタチャリヤ距離, カルバックライブラ距離など, 様々な分布間距離の一般形として位置づけられており, より本質的な意味に置いて, 不変表象の数学的基盤を構築することができた。 f-divergenceは変換不変であるが, 話者の変化はどのような変換関数としてモデル化されるのか? 従来この問題はGMMによる変換関数推定が広く行なわれているが, 本研究では, この従来法の欠点を明確にし, より正しい最適化手法を用いて変換関数推定を行なう手法を提案した。実験的にも提案手法を用いることで, 推定誤差を有意に削減できることを確認した。その一方で, f-divergenceに基づく音声表象は, 不変性が極めて強く, 例えば, 異なる単語が等しいと判定されることが起こりえる。これは, 話者の違いも音韻の違いも同一の物理量を変形することが原因であり, 一種のトレードオフとなる。結局望まれるのは, 話者性だけに不変な制約付きの不変性である。本研究では, 話者性の変換がどのような変換群を構成するのかに着眼し, 限られた変換群のみに対して不変性が成立する手法を提案し, 実験的にその有効性を検証した。また, f-divergenceは事象と事象の差分(間隔)を測る尺度であるため, 事象がN個存在する場合は, N(N-1)/2個の測定量が得られ, パラメータ次元数が容易に増加する。これを削減するために, LDAやPCAの効果的導入をはかり, eigen structureと呼ばれる特徴量表現を提案するに至った。更に, 実用アプリケーションとして, 外国語発音の評価システムを構築した。数年後には全ての公立小学校で英語教育が開始される。ここでは話す/聞く教育がメインとなるが, 例えば発音を指導できる教師は非常に限られている。このような情勢を考慮し, 子どもの声であっても頑健に処理できる音声の構造的表象を用いたCALL(Computer Aided Language Learning)システムの構築を行なった。600名以上の学習者の音声を評価し, 発音カルテと呼ばれる診断書の配布などを行なった。

Report

(2 results)

2008 Annual Research Report
2007 Annual Research Report

Research Products
(35 results)

All 2009 2008 2007 Other

All Journal Article (13 results) (of which Peer Reviewed: 13 results) Presentation (19 results) Book (1 results) Remarks (2 results)

[Journal Article] Multi-stream parameterization for structural speech recognition2008
- Author(s)
  A. Asakawa, N. Minematsu, K. Hirose
- Journal Title
  
  Proc. Int. Conf. Acoustics, Speech, & Signal Processing
  
  Pages: 4097-4100
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Directional dependency of cepstrum on vocal tract length2008
- Author(s)
  D. Saito, R. Matsuura, S. Asakawa, N. Minematsu, K. Hirose
- Journal Title
  
  Proc. Int. Conf. Acoustics, Speech, & Signal Processing
  
  Pages: 4485-4488
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Unsupervised optimal phoneme segmentation : objectives, algorithm and comparisons2008
- Author(s)
  Y. Qiao, N. Shimomura, N. Minematsu
- Journal Title
  
  Proc. Int. Conf. Acoustifs, Speech, & Signal Processing
  
  Pages: 3989-3992
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Training of pronunciation as learning of the sound system embedded in the target language2008
- Author(s)
  N. Minematsu
- Journal Title
  
  Proc. The 8th Phonetic Conference of China and Int. Symposium on Phonetic Frontiers (CD-ROM)
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Directional dependency of cepstrum on vocal tract length2008
- Author(s)
  N. Minematsu, T. Nishimura, D. Saito, S. Asakawa, Y. Qiao
- Journal Title
  
  Proc. Int. Conf. Acoustics, Speech, & Signal Processing
  
  Pages: 4485-4488
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Holistic and prosodic representation of the segmental aspect of speech2008
- Author(s)
  D. Saito, R. Matsuura, S. Asakawa, N. Minematsu, K. Hirose
- Journal Title
  
  Proc. Int. Conf. Speech Prosody
  
  Pages: 169-172
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Speech as timbre-based melody-What in parants' voices do infants imitate acoustically?--2008
- Author(s)
  N. Minematsu and T. Nishimura
- Journal Title
  
  Proc. Int. Conf. Language, Music, and the Mind (CD-ROM)
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Metric learning for unsupervised phoneme segmentation2008
- Author(s)
  Y. Qiao and N. Minematsu
- Journal Title
  
  Proc. INTERSPEECH
  
  Pages: 1060-1063
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] f-divergence is a generalized invariant measure between distributions2008
- Author(s)
  Y. Qiao and N. Minematsu
- Journal Title
  
  Proc. INTERSPEECH
  
  Pages: 1349-1352
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Structure to speech-speech generation based on infant-like vocal imitation-2008
- Author(s)
  D. Saito, S. Asakawa, N. Minematsu, and K. Hirose
- Journal Title
  
  Proc. INTERSPEECH
  
  Pages: 1837-1840
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Decomposition of rotational distortion caused by VTL difference using eigenvalues of its transofmation matrix2008
- Author(s)
  D. Saito, N. Minematsu, and K. Hiros
- Journal Title
  
  Proc. INTERSPEECH
  
  Pages: 1361-1364
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] 音声の構造的表象に基づく日本語孤立母音系列を対象とした音声認識2008
- Author(s)
  村上隆夫, 峯松信明, 広瀬啓吉
- Journal Title
  
  電子情報通信学会論文誌 J91-A-2
  
  Pages: 181-192
- NAID
  110007384598
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] 音声の構造的表象に基づく英語学習者発音の音響分析2007
- Author(s)
  朝川智, 峯松信明, 広瀬啓吉
- Journal Title
  
  電子情報通信学会論文誌 J90-D-5
  
  Pages: 1249-1262
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Presentation] Mixture of probabilistic linear regression models for voice conversion2009
- Author(s)
  Y. Qiao, D. Saito, N. Minematsu
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  奈良
- Year and Date
  2009-01-30
- Related Report
  2008 Annual Research Report
[Presentation] アフィン変換不変性を有する局所的特徴量を用いた音声認識2008
- Author(s)
  鈴木雅之, 喬宇, 峯松信明, 廣瀬啓吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  東京
- Year and Date
  2008-12-10
- Related Report
  2008 Annual Research Report
[Presentation] 音声の構造的表象と判別分析を用いた単語音声認識2008
- Author(s)
  朝川智, 喬宇, 峯松信明, 廣瀬啓吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  東京
- Year and Date
  2008-12-10
- Related Report
  2008 Annual Research Report
[Presentation] 音声言語運用が要求する認知的能力と音声言語工学が構築した計算論的能力2008
- Author(s)
  峯松信明
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  東京
- Year and Date
  2008-12-09
- Related Report
  2008 Annual Research Report
[Presentation] Pronunciation clinic -which part of your pronunciation to correct at first to become like your model speaker?-2008
- Author(s)
  N. Minematsu, K. Kamata, M. Takazawa, K. Takeuchi, S. Asakawa, T. Makino, Y. Yamauchi, T. Nishimura,' K. Hirose
- Organizer
  World CALL
- Place of Presentation
  福岡
- Year and Date
  2008-08-07
- Related Report
  2008 Annual Research Report
[Presentation] 変換不変性を有するダイバージェンスとその一般形2008
- Author(s)
  喬宇, 峯松信明
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  岩手
- Year and Date
  2008-07-18
- Related Report
  2008 Annual Research Report
[Presentation] 構造表象を用いた音声認識におけるパラメータ共有とその効果2008
- Author(s)
  松浦良, 齋藤大輔, 朝川智, 峯松信明, 廣瀬啓吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  岩手
- Year and Date
  2008-07-18
- Related Report
  2008 Annual Research Report
[Presentation] スペクトル特徴量を用いた音声の構造的表象に関する実験的検討2008
- Author(s)
  鈴木雅之, 朝川智, 喬宇, 峯松信明, 廣瀬啓吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  札幌
- Year and Date
  2008-06-28
- Related Report
  2008 Annual Research Report
[Presentation] 構造的表象からの音声合成とそれに基づく音声模倣に関する検討2008
- Author(s)
  齋藤大輔, 朝川智, 峯松信明, 廣瀬啓吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  札幌
- Year and Date
  2008-06-28
- Related Report
  2008 Annual Research Report
[Presentation] Structural assessment of language learners' pronunciation2007
- Author(s)
  N. Minematsu, et. al.
- Organizer
  Proc. INTERSPEECH
- Place of Presentation
  Antwerp, Belguim
- Related Report
  2007 Annual Research Report
[Presentation] Automatic recognition of connected vowels only using speaker-invariant representation of speech dynamics2007
- Author(s)
  S. Asakawa, et. al.
- Organizer
  Proc. INTERSPEECH
- Place of Presentation
  Antwerp, Belguim
- Related Report
  2007 Annual Research Report
[Presentation] Development of a spoken word recognizer without phonemic awareness - Is this machine a Dyslexia simulator?-2007
- Author(s)
  N. Minematsu, et. al.
- Organizer
  Proc. The 2nd Riken Brain Science Institute and Oxford-Kobe Joint International Symposium
- Place of Presentation
  Kobe, Japan
- Related Report
  2007 Annual Research Report
[Presentation] Random discriminant structure analysis for continuous Japanese vowel recognition2007
- Author(s)
  Y. Qiao, et. al.
- Organizer
  Proc. ASRU
- Place of Presentation
  Kyoto, Japan
- Related Report
  2007 Annual Research Report
[Presentation] 要素論から全体論へ〜全体から入る音声情報処理への招待〜2007
- Author(s)
  峯松信明, 他
- Organizer
  情報処理学会音声言語情報処理研究会
- Place of Presentation
  日本
- Related Report
  2007 Annual Research Report
[Presentation] 孤立音[あ]を聞いて/あ/と同定する能力は音声言語に必要か?2007
- Author(s)
  峯松信明, 他
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  日本
- Related Report
  2007 Annual Research Report
[Presentation] 学習者による米語母音発音に対する絶対的評価と相対的評価の相関分析2007
- Author(s)
  鎌田圭, 他
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  日本
- Related Report
  2007 Annual Research Report
[Presentation] 孤立音を聞いて音韻同定できる能力は音声言語運用に必要か?2007
- Author(s)
  峯松信明, 他
- Organizer
  日本音声学会全国大会
- Place of Presentation
  日本
- Related Report
  2007 Annual Research Report
[Presentation] 構造的表象からの音声生成に関する基礎的研究2007
- Author(s)
  齋藤大輔, 他
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  日本
- Related Report
  2007 Annual Research Report
[Presentation] ケプストラムの声道長依存性に関する幾何学的考察2007
- Author(s)
  齋藤大輔, 他
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  日本
- Related Report
  2007 Annual Research Report
[Book] Lecture notes of Artificial Intelligence (4914)2008
- Author(s)
  N. Minematsu and T. Nishimura
- Publisher
  Consideration of infants' vocal imitation through modeling speech as timbre-based melody
- Related Report
  2007 Annual Research Report
[Remarks]
- URL
  http://www.gavo.t.u-tokyo.ac.jp/~mine/japanese/paper/2007.html
- Related Report
  2007 Annual Research Report
[Remarks]
- URL
  http://www.gavo.t.u-tokyo.ac.jp/~mine/japanese/paper/2008.html
- Related Report
  2007 Annual Research Report

構造不変の定理に基づく音声アフォーダンスの提案とそれに立脚した音声認識系の構築

Principal Investigator

峯松 信明 The University of Tokyo, 大学院・工学系研究科, 准教授 (90273333)

¥7,500,000 (Direct Cost: ¥7,500,000)

Report

Research Products

[Journal Article] Multi-stream parameterization for structural speech recognition2008

Author(s)

Journal Title

Related Report

[Journal Article] Directional dependency of cepstrum on vocal tract length2008

Author(s)

Journal Title

Related Report

[Journal Article] Unsupervised optimal phoneme segmentation : objectives, algorithm and comparisons2008

Author(s)

Journal Title

Related Report

[Journal Article] Training of pronunciation as learning of the sound system embedded in the target language2008

Author(s)

Journal Title

Related Report

[Journal Article] Directional dependency of cepstrum on vocal tract length2008

Author(s)

Journal Title

Related Report

[Journal Article] Holistic and prosodic representation of the segmental aspect of speech2008

Author(s)

Journal Title

Related Report

[Journal Article] Speech as timbre-based melody-What in parants' voices do infants imitate acoustically?--2008

Author(s)

Journal Title

Related Report

[Journal Article] Metric learning for unsupervised phoneme segmentation2008

Author(s)

Journal Title

Related Report

[Journal Article] f-divergence is a generalized invariant measure between distributions2008

Author(s)

Journal Title

Related Report

[Journal Article] Structure to speech-speech generation based on infant-like vocal imitation-2008

Author(s)

Journal Title

Related Report

[Journal Article] Decomposition of rotational distortion caused by VTL difference using eigenvalues of its transofmation matrix2008

Author(s)

Journal Title

Related Report

[Journal Article] 音声の構造的表象に基づく日本語孤立母音系列を対象とした音声認識2008

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 音声の構造的表象に基づく英語学習者発音の音響分析2007

Author(s)

Journal Title

Related Report

[Presentation] Mixture of probabilistic linear regression models for voice conversion2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] アフィン変換不変性を有する局所的特徴量を用いた音声認識2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声の構造的表象と判別分析を用いた単語音声認識2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声言語運用が要求する認知的能力と音声言語工学が構築した計算論的能力2008

Author(s)

Organizer

峯松信明 The University of Tokyo, 大学院・工学系研究科, 准教授 (90273333)

[Presentation] 要素論から全体論へ〜全体から入る音声情報処理への招待〜2007