言語情報とパラ言語情報を統合した音声の構造的表象の提案とその音声合成への応用

Research Project

Project/Area Number	19650036
Research Category	Grant-in-Aid for Exploratory Research
Allocation Type	Single-year Grants
Research Field	Perception information processing/Intelligent robotics
Research Institution	The University of Tokyo
Principal Investigator	峯松信明 The University of Tokyo, 大学院・情報理工学系研究科, 准教授 (90273333)
Co-Investigator(Kenkyū-buntansha)	広瀬啓吉東京大学, 大学院・情報理工学系研究科, 教授 (50111472)
Project Period (FY)	2007 – 2008
Project Status	Completed (Fiscal Year 2008)
Budget Amount *help	¥3,300,000 (Direct Cost: ¥3,300,000) Fiscal Year 2008: ¥1,500,000 (Direct Cost: ¥1,500,000) Fiscal Year 2007: ¥1,800,000 (Direct Cost: ¥1,800,000)
Keywords	音声の構造的表象 / f-divergence / 写像不変量 / 空間探索 / 言語的情報・パラ言語的特徴 / 音声合成 / 音声模倣 / 言語獲得 / 語ゲシュタルト / 話者不変量 / 音声生成・合成
Research Abstract	音声が運ぶ情報は大きく,言語的情報,パラ言語的情報,非言語的情報に分かれる。我々は音声から非言語的情報に相当する音響特徴量のみを分離する方法を提案している。年齢・性別による音声の音響的変形,収録機器・伝送機器による音声の音響的変形はいずれも,静的な空間写像として数学的にモデル化できる。よって,写像不変量でもって音声を表象・モデル化することで,静的な変形(変換)に不変な音声情報処理が可能となる。我々は分布間の距離尺度であるf-divergenceが如何なる変換に対しても不変であることを証明しており,発声中の全ての音響事象を分布として捉え,任意の二分布間(事象間)距離を計測し,距離行列として音声を(話者不変的に)表象する手法を提案している。距離行列は一つの幾何学的形態を規定するため,これを音声の構造的表象と呼んでいる。非言語情報がそぎ落とされるということは,言語情報とパラ言語情報のみが表象された音声表象であることを意味する。本研究では,この構造表象に対して,非言語的情報である話者の性別,年齢,体格(即ち声道形状)を戻すことで音声を生成する枠組みを検討した。即ち,言語情報,パラ言語情報は構造として与えられ,その構造を音に変換する声道の長さや形状の情報(非言語的情報)を付与することで音に変換する枠組みである。具体的には,幾つかの既に実現された音事象を初期条件として与え,構造的表象を制約条件としてその後の音事象を次々と音響空間内に定位する方法を採択した。この場合,定位済みの事象群をn個とすると,このn個の事象を中心とする超楕円を描き,n個の超楕円の交点が次に生成すべき音の定位場所,となる。この探索問題を計算機上に実装し,また,いくつかの高速化アルゴリズムを検討することで,現実的な計算量で構造からの音声生成を可能にした。この音声生成方式は,言語情報+パラ言語情報が混在した音声表象(構造的表象)を出発点として音を導出するという点が従来の音声生成方式とは大きく異なる。

Report

(2 results)

2008 Annual Research Report
2007 Annual Research Report

Research Products

(18 results)

All 2009 2008 2007

All Journal Article (9 results) (of which Peer Reviewed: 8 results) Presentation (8 results) Book (1 results)

[Journal Article] Improvement of structure to speech conversion using iterative optimization2009
- Author(s)
  D.Saito, Y.Qiao, N.Minematsu, K.Hirose
- Journal Title
  
  Proc.Speech and Computer
  
  Pages: 174-179
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Optimal event search using a structural cost function-improvement of structure to speech conversion-2009
- Author(s)
  D.Saito, Y.Qiao, N.Minematsu, K.Hirose
- Journal Title
  
  Proc.INTERSPEECH
  
  Pages: 2047-2050
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Structure to speech--speech generation based on infant-like vocal imitation--2008
- Author(s)
  D.Saito, S.Asakawa, N.Minematsu, K.Hirose
- Journal Title
  
  Proc.INTERSPEECH
  
  Pages: 1837-1840
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] f-divergence is a generalized invariant measure between distributions2008
- Author(s)
  Y.Qiao, N.Minematsu
- Journal Title
  
  Proc.INTERSPEECH
  
  Pages: 1349-1352
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Holistic and prosodic representation of the segmental aspect of speech2008
- Author(s)
  N.Minematsu, T.Nishimura, D.Saito, S.Asakawa, Y.Qiao
- Journal Title
  
  Proc.Int.Conf.Speech Prosody
  
  Pages: 169-172
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Multi-stream parameterization for structural speech recognition2008
- Author(s)
  S.Asakawa, N.Minematsu, K.Hirose
- Journal Title
  
  Proc.ICASSP
  
  Pages: 4097-4100
- Related Report
  2008 Annual Research Report
- Peer Reviewed
[Journal Article] Structure to speech-speech generation based on infant-like vocal imitation-2008
- Author(s)
  D. Saito, N. Minematsu, K. Hirose
- Journal Title
  
  Proc. INTERSPEECH
  
  Pages: 1837-1840
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] Directional dependency of cepstrum on vocal tract length2008
- Author(s)
  D. Saito, R. Matsuura, S. Asakawa, N. Minematsu, K. Hirose
- Journal Title
  
  Proc. Int. Conf. Acoustics, Speech and Signal Processing
  
  Pages: 4485-4488
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Journal Article] 構造的表象からの音声生成に関する基礎的検討2007
- Author(s)
  斎藤大輔, 朝川智, 峯松信明, 広瀬啓吉
- Journal Title
  
  電子情報通信学会音声研究会SP2007-80
  
  Pages: 55-60
- NAID
  110006449178
- Related Report
  2007 Annual Research Report
[Presentation] 構造評価関数を用いた構造的表象からの音声合成系の高精度化2009
- Author(s)
  齋藤大輔, 喬宇, 峯松信明, 広瀬敬吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  静岡大学
- Year and Date
  2009-11-01
- Related Report
  2008 Annual Research Report
[Presentation] 二言語に渡る構造的表象に基づく音声・言語変換の実験的検討2009
- Author(s)
  見原隆介, 齋藤大輔, 峯松信明, 広瀬敬吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  静岡大学
- Year and Date
  2009-11-01
- Related Report
  2008 Annual Research Report
[Presentation] 反復解法に基づく構造的表象からの音声合成の高精度化に関する検討2009
- Author(s)
  齋藤大輔, 喬宇, 峯松信明, 広瀬敬吉
- Organizer
  日本音響学会春季全国大会
- Place of Presentation
  東京工業大学
- Year and Date
  2009-03-01
- Related Report
  2008 Annual Research Report
[Presentation] 音声言語運用が要求する認知的能力と音声言語工学が構築した計算論的能力2008
- Author(s)
  峯松信明
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  早稲田大学
- Year and Date
  2008-12-09
- Related Report
  2008 Annual Research Report
[Presentation] 変換不変性を有するダイバージェンスとその一般形2008
- Author(s)
  喬宇, 峯松信明
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  岩手県立大学
- Year and Date
  2008-07-01
- Related Report
  2008 Annual Research Report
[Presentation] 音声の不変表象に基づく語ゲシュタルトの物理的解釈とそれに基づく幼児の音声模倣の実装2008
- Author(s)
  齋藤大輔, 朝川智, 峯松信明, 西村多寿子, 広瀬敬吉
- Organizer
  人工知能学会全国大会
- Place of Presentation
  北海道・旭川
- Year and Date
  2008-06-15
- Related Report
  2008 Annual Research Report
[Presentation] 構造的表象からの音声合成とそれに基づく音声模倣に関する検討2008
- Author(s)
  齋藤大輔, 朝川智, 峯松信明, 広瀬敬吉
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  大学共同利用施設ACU
- Year and Date
  2008-06-01
- Related Report
  2008 Annual Research Report
[Presentation] 音声の不変表象に基づく語ゲシュタルトの物理的解釈とそれに基づく幼児の音声模倣の実装2008
- Author(s)
  斎藤大輔, 朝川智, 峯松信明, 西村多寿子, 広瀬啓吉
- Organizer
  人工知能学会全国大会
- Place of Presentation
  北海道
- Related Report
  2007 Annual Research Report
[Book] "Consideration of infants' vocal imitation through modeling speech as timbre-based melody" in New Frontiers in Artificial Intelligence, LNAI49142008
- Author(s)
  N.Minematsu, T.Nishimura
- Total Pages
  14
- Publisher
  Springer
- Related Report
  2008 Annual Research Report

言語情報とパラ言語情報を統合した音声の構造的表象の提案とその音声合成への応用

Principal Investigator

峯松 信明 The University of Tokyo, 大学院・情報理工学系研究科, 准教授 (90273333)

¥3,300,000 (Direct Cost: ¥3,300,000)

Report

Research Products

[Journal Article] Improvement of structure to speech conversion using iterative optimization2009

Author(s)

Journal Title

Related Report

[Journal Article] Optimal event search using a structural cost function-improvement of structure to speech conversion-2009

Author(s)

Journal Title

Related Report

[Journal Article] Structure to speech--speech generation based on infant-like vocal imitation--2008

Author(s)

Journal Title

Related Report

[Journal Article] f-divergence is a generalized invariant measure between distributions2008

Author(s)

Journal Title

Related Report

[Journal Article] Holistic and prosodic representation of the segmental aspect of speech2008

Author(s)

Journal Title

Related Report

[Journal Article] Multi-stream parameterization for structural speech recognition2008

Author(s)

Journal Title

Related Report

[Journal Article] Structure to speech-speech generation based on infant-like vocal imitation-2008

Author(s)

Journal Title

Related Report

[Journal Article] Directional dependency of cepstrum on vocal tract length2008

Author(s)

Journal Title

Related Report

[Journal Article] 構造的表象からの音声生成に関する基礎的検討2007

Author(s)

Journal Title

NAID

Related Report

[Presentation] 構造評価関数を用いた構造的表象からの音声合成系の高精度化2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 二言語に渡る構造的表象に基づく音声・言語変換の実験的検討2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 反復解法に基づく構造的表象からの音声合成の高精度化に関する検討2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声言語運用が要求する認知的能力と音声言語工学が構築した計算論的能力2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 変換不変性を有するダイバージェンスとその一般形2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声の不変表象に基づく語ゲシュタルトの物理的解釈とそれに基づく幼児の音声模倣の実装2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 構造的表象からの音声合成とそれに基づく音声模倣に関する検討2008

峯松信明 The University of Tokyo, 大学院・情報理工学系研究科, 准教授 (90273333)