構造不変の定理に基づく音声アフォーダンスの提案とそれに立脚した音声認識系の構築

Research Project

Project/Area Number	18049018
Research Category	Grant-in-Aid for Scientific Research on Priority Areas
Allocation Type	Single-year Grants
Review Section	Science and Engineering
Research Institution	The University of Tokyo
Principal Investigator	峯松信明東京大学, 大学院新領域創成科学研究科, 助教授 (90273333)
Project Period (FY)	2006
Project Status	Completed (Fiscal Year 2006)
Budget Amount *help	¥3,300,000 (Direct Cost: ¥3,300,000) Fiscal Year 2006: ¥3,300,000 (Direct Cost: ¥3,300,000)
Keywords	構造不変の定理 / 音声アフォーダンス / 音声認識 / 非言語的情報 / 発達性dyslexia / 多次元音楽
Research Abstract	音声から言語情報・パラ言語情報を抽出する場合,年齢/性別/収録聞きの違いによって付与される音響歪みは純粋なノイズとなる。従来これらのノイズに対処するために,多量の音声でイータを収集し,それらから統計的な音響モデルを構築していた。本研究では,集めることで解決を図るのではなく,これらのノイズを表現する次元を消失した音声モデリング(音声アフォーダンスを数学的に定式化することで解決を図った。音声ストリームを分布系列に変換し,時間的に離れた分布対を含め、全ての二分布距離をバタチャリヤ距離と呼ばれる距離尺度を用いて計算する。全ての2事象間距離を求める(即ち距離行列を算出する)ことは,幾何学的な構造を規定することに等しいが,距離尺度としてバタチャリヤ距離を用いることで,空間を歪ませて構造不変性を保証している。先行研究では,孤立母音の系列を対象として上記音声表象の妥当性を検討したが,本年度はこれを連続音声へと拡張して検討を行なった。この場合,状態数の増加に伴う問題が発生するが,構造不変性を部分空間においても仮定することで認識率の大幅な向上を実現した。具体的には,日本語5母音を並び替えて構成される120単語認識をタスクとして実験を行ったところ,単語単位では93%,母音単位では97%という率が得られた。これは,音声の絶対的な物理量を一切用いずに,単語が認識でき,かつ,母音を同定することが可能であることを示す。従来,音の同定には音の絶対的な特徴量を用いて来たが(故に,音響歪みが混入する),これとは全く異なる枠組みにおいて,音声の認識が可能であることを示している。この場合,モデル学習に必要な話者数は極めて少数でよい。なお,本手法は孤立音の同定は原理上できなくなる。つまり,音の同定を行なうことなく,単語の同定を行なうアルゴリズムとなる訳だが,似た症状を呈する障害として発達性dyslexiaがある文字の読み書きにのみ困難を示す症状である。本研究は,この症状を物理的に説明するモデルを提供する可能性があり,言語障害関係の学会において様々な議論を重ねることができた。

Report

(1 results)

2006 Annual Research Report

Research Products
(5 results)

All 2006

All Journal Article (5 results)

[Journal Article] Speech recognition only with supra-segmental features-hearing speech as music-2006
- Author(s)
  N.Minematsu, T.Nishimura, T.Murakami, K.Hirose
- Journal Title
  
  Proc. Speech Prosody
  
  Pages: 589-594
- Related Report
  2006 Annual Research Report
[Journal Article] Para-linguistic information represented as distortion of the acoustic universal structure in speech2006
- Author(s)
  N.Minematsu, S.Asakawa, K.Hirose
- Journal Title
  
  Proc. ICASSP 5
  
  Pages: 85-88
- Related Report
  2006 Annual Research Report
[Journal Article] Theorem of the invariant structure and its derivation of speech Gestalt2006
- Author(s)
  N.Minematsu, T.Nishimura, K.Nishinari, K.Sakuraba
- Journal Title
  
  Proc. SRIV
  
  Pages: 47-52
- NAID
  10016435675
- Related Report
  2006 Annual Research Report
[Journal Article] 音声の構造的表象を通して考察する失読症・自閉症の音声認知2006
- Author(s)
  峯松信明, 櫻庭京子, 西村多寿子
- Journal Title
  
  電子情報通信学会音声研究会SP2006-74
  
  Pages: 27-32
- NAID
  110005717068
- Related Report
  2006 Annual Research Report
[Journal Article] 音声の構造的表象を通して再考する幼児の音声模倣と言語獲得2006
- Author(s)
  峯松信明, 西村多寿子, 櫻庭京子
- Journal Title
  
  人工知能学会AIチャレンジ研究会SIG-Challenge-0624-6
  
  Pages: 35-42
- Related Report
  2006 Annual Research Report

構造不変の定理に基づく音声アフォーダンスの提案とそれに立脚した音声認識系の構築

Principal Investigator

峯松 信明 東京大学, 大学院新領域創成科学研究科, 助教授 (90273333)

¥3,300,000 (Direct Cost: ¥3,300,000)

Report

Research Products

[Journal Article] Speech recognition only with supra-segmental features-hearing speech as music-2006

Author(s)

Journal Title

Related Report

[Journal Article] Para-linguistic information represented as distortion of the acoustic universal structure in speech2006

Author(s)

Journal Title

Related Report

[Journal Article] Theorem of the invariant structure and its derivation of speech Gestalt2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 音声の構造的表象を通して考察する失読症・自閉症の音声認知2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 音声の構造的表象を通して再考する幼児の音声模倣と言語獲得2006

Author(s)

Journal Title

Related Report

峯松信明東京大学, 大学院新領域創成科学研究科, 助教授 (90273333)