2012 Fiscal Year Annual Research Report

長期間収録音声コーパスに基づく話者内音声変動に頑健な話者認識手法の研究

Research Project

Project/Area Number	21300060
Research Institution	Chiba University
Principal Investigator	黒岩眞吾千葉大学, 融合科学研究科(研究院), 教授 (20333510)
Co-Investigator(Kenkyū-buntansha)	柘植覚大同大学, 情報学部, 准教授 (00325250) 長内隆科学警察研究所, 法科学第四部, 室長 (70392264) 篠崎隆宏東京工業大学, 総合理工学研究科(研究院), 准教授 (80447903)
Project Period (FY)	2009-04-01 – 2014-03-31
Keywords	話者認識 / 話者照合 / 話者識別 / 話者内音声変動 / 長期間収録音声コーパス / AWA-LTR / SVM / 法科学
Research Abstract	（1A）多数話者長期間音声コーパスの構築：一昨年度より開始した一般家庭内環境での収録を引き続き行っている。現在までに140人の音声データ（発声期間回数1～7）を収集した1 （1B）少数話者の長・短期間音声コーパスの構築：2003年より行っている１名の話者（研究代表者）の週1回、朝・昼・夕の各15分程度の収録を引き続き行った。また、2010年度に収集した1年分・1名のデータを『AWA長期間収録音声コーパス(AWA-LTR)』との名称で国立情報学研究所を通じ公開し、配布を開始した。（２）話者性の抽出及び話者内変動のモデル化：GMMスーパーベクター(GMM-SVM)法に基づく話者照合のためのUBM構築手法、スコア正規化手法等を検討し精度の高いGMM-SVM法のベースラインを作成した。現在、話者内変動モデルを同手法に適用する部分を作成中である。また、腹式発声時と胸式発声時の音声がパワースペクトル上でも変形することを明らかにした。（３）計算量にとらわれない高精度かつ頑健な新しい話者認識手法の検討：疑似話者モデルがGMM-SVMでは性能の改善につながらないことが明らかとなった。一方で、マイクからの距離の違い等に起因する残響等を含む変形を、ディープラーニングを用いたニューラルネットにより補償する手法を検討し、その可能性を明らかにした。また、複数話者の音声が重なった場合でも、照合と学習を繰り返すことで、含まれている話者のすべてを検出可能な手法を提案した。（４）法科学における話者照合の有効性の検討：話者内変動が少なく, 話者間・方言間差が大きい指標である調音速度を用いた母語識別実験を行った結果、モーラを基準とした特徴量で良好な結果が得られた。また、母音の無声化頻度が話者の出身地推定に利用できる可能性を明らかにした。今後、これらの新しい特徴量を計算機による話者認識へも適用することを検討する。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 実施項目ごとの達成度を以下に示す。（１）話者認識研究で利用可能な大規模音声コーパスの構築：少数話者の長・短期間音声コーパスの収集は計画通り進んでいる。また、1名の1年分の発声をコーパス化し『AWA長期間収録音声コーパス(AWA-LTR)』として国立情報学研究所から公開も行った。全収集データのコーパス化及び公開にあたっては本研究予算のみでは不足するため、国立情報学研究所に支援を求め、平成25年度には実施したい。一方、多数話者長期間音声コーパスの構築も予算規模に見合ったデータの収集を順調に続けている。（２）話者内音声変動、音韻性、環境要因、話者性を分離可能な音響特徴空間分離手法の開発では、手法の考案、開発を続けると同時に、世界的なトレンドに追随するためのベースライン環境の構築を続けている。当初計画していた日本語データベースに対しては、計画通りの性能が達成できている。今後、米語等への対応を急ぎ、国際コンテストへの参加を行う必要がある。（３）計算量にとらわれない高精度かつ頑健な新しい話者認識手法の開発では、ディーップラーニング、疑似話者モデル構築法などの新しい手法や、複数の話者が同時発声やスピーカーを通した音声など、新しいタスク・新しい問題への検討を進めている。（４）法科学の観点から、音声による個人認証の有効性と限界の検討に関しても、大規模な音声データベースに基づく、各種音響特徴量の調査に基づき有効性とその限界を明らかにしてきている。さらに、非母国語話者に対する検討も開始しており計画以上の成果が達成できている。
Strategy for Future Research Activity	現在までに、ほぼ計画通りに計画が進行していることから、平成25年度の計画書に記載した下記の事項を精力的に実施する。（1A）多数話者長期間音声コーパスの構築：現在までに収集した音声データを整理し、コーパス化を進める。（1B）少数話者の長・短期間音声コーパスの構築：研究代表者は週1回、朝・昼・夕の各15分程度の収録を引き続き行う。また、一昨年度までに収録した全データを整理し、コーパス化を目指す。また、付加情報もし、国立情報学研究所を通じ公開・配布する。（２）話者性の抽出及び話者内変動のモデル化：GMM-MAP手法で有効性が認められた話者内変動モデルをi-vector手法に適用すると共に、音素毎に異なると予想される話者内変動を部分空間毎に精密にモデル化する手法を検討する。また、話者の自発的な音声変形（感情を込める等）を含む音声を収集し、その音声変形を抽出できるパラメータとして短時間フレーム毎の特徴量に加え、発声単位での特徴パラメータを引き続き検討する。（３）計算量にとらわれない高精度かつ頑健な新しい話者認識手法の検討：昨年度に提案した、ディープラーニングに基づくニューラルネットワークを用いた環境変動に頑健な特徴抽出手法の検討を進める。また、複数話者が同時に発声している場合でも、各々の話者を同時認識できる手法の検討を引き続き行う。さらに、複数の手法を統合したスコア正規化及び閾値設定法を検討する。（４）法科学における話者照合の有効性の検討：法科学分野における話者認識において、人間の聴覚及び視覚による話者認識性能と機械による認識性能を比較し、人間による支援の効果について調査する。また、引き続き、母語話者・非母語話者の識別手法および性別識別手法の検討を実施する。

Research Products
(20 results)

All 2013 2012

All Journal Article (1 results) Presentation (19 results) (of which Invited: 1 results)

[Journal Article] エネルギー変化の線形予測符号化に基づくリズム特徴量を用いた音楽印象識別2013
- Author(s)
  三好真人
- Journal Title
  
  情報処理学会論文誌
  
  Volume: 55 Pages: 未定
[Presentation] 構内アナウンス環境下における音声認識のための音声区間検出法2013
- Author(s)
  紺野遼輔
- Organizer
  2013年電子情報通信学会総合大会
- Place of Presentation
  岐阜市
- Year and Date
  20130319-20130322
[Presentation] 発話の重なりを考慮した話者インデキシング手法2013
- Author(s)
  大久保雅利
- Organizer
  2013年電子情報通信学会総合大会
- Place of Presentation
  岐阜市
- Year and Date
  20130319-20130322
[Presentation] 複数分析窓長を用いたAutoencoderに基づく残響除去の検討2013
- Author(s)
  石井敬章
- Organizer
  日本音響学会 2013 年春季研究発表会
- Place of Presentation
  八王子市
- Year and Date
  20130313-20130315
[Presentation] FPGA実装による小型低消費電力単語検出手法の比較検討2013
- Author(s)
  永谷悠
- Organizer
  日本音響学会 2013 年春季研究発表会
- Place of Presentation
  八王子市
- Year and Date
  20130313-20130315
[Presentation] 調音速度を用いた母語識別における発話内容の影響2013
- Author(s)
  網野加苗
- Organizer
  日本音響学会 2013 年春季研究発表会
- Place of Presentation
  八王子市
- Year and Date
  20130313-20130315
[Presentation] 母音の無声化頻度と話者の出身地に関する考察2013
- Author(s)
  網野加苗
- Organizer
  日本音響学会 2013 年春季研究発表会
- Place of Presentation
  八王子市
- Year and Date
  20130313-20130315
[Presentation] AWA Long-Term Recording Speech Corpus (AWA-LTR)2013
- Author(s)
  柘植覚
- Organizer
  2013 International Workshop on Nonlinear Circuits, Communication and Signal Processing
- Place of Presentation
  Kailua-Kona,米国
- Year and Date
  20130304-20130307
[Presentation] 話者認識技術の現状と課題2013
- Author(s)
  網野加苗
- Organizer
  電子情報通信学会技術研究報告, vol.112, No.450, SP2012-131
- Place of Presentation
  名古屋市
- Year and Date
  20130228-20130301
- Invited
[Presentation] コミュニケーション支援のための眼球動作入力音声合成インタフェースの研究2013
- Author(s)
  房福明
- Organizer
  電子情報通信学会技術研究報告, vol. 112, no. 426, WIT2012-38
- Place of Presentation
  名古屋市
- Year and Date
  20130202-20130202
[Presentation] 認識対象区間を考慮した音声からの印象認識2012
- Author(s)
  内田正洋
- Organizer
  電子情報通信学会 HCGシンポジウム2012
- Place of Presentation
  熊本市
- Year and Date
  20121218-20121220
[Presentation] Foreign accent identification using articulation rate of Japanese read speech2012
- Author(s)
  網野加苗
- Organizer
  14th Australasian International Conference on Speech Science & Technology
- Place of Presentation
  シドニー、オーストラリア
- Year and Date
  20121206-20121206
[Presentation] Pipeline Decomposition of Speech Decoders and Their Implementation Based on Delayed Evaluation2012
- Author(s)
  篠崎隆宏
- Organizer
  APSIPA Annual Summit and Conference 2012
- Place of Presentation
  Hollywood, 米国
- Year and Date
  20121203-20121206
[Presentation] Open Answer Scoring for S-CAT Automated Speaking Test System Using Support Vector Regression2012
- Author(s)
  小野豊
- Organizer
  APSIPA Annual Summit and Conference 2012
- Place of Presentation
  Hollywood, 米国
- Year and Date
  20121203-20121206
[Presentation] 母音による性別識別性能の比較2012
- Author(s)
  長内　隆
- Organizer
  日本法科学技術学会第18回学術集会
- Place of Presentation
  東京都港区
- Year and Date
  20121116-20121116
[Presentation] PCA Transformation Based Inter-session Variability Suppression for Text-Independent Speaker Identification2012
- Author(s)
  Lu, Haoze
- Organizer
  8th International Conference on Natural Language Processing and Knowledge Engineering
- Place of Presentation
  Hefei(HuangShan), 中国
- Year and Date
  20120920-20120924
[Presentation] コミュニケーション支援のための連続眼電位認識の研究2012
- Author(s)
  房福明
- Organizer
  日本音響学会 2012 年秋季研究発表会
- Place of Presentation
  長野市
- Year and Date
  20120919-20120921
[Presentation] 音声認識システムのパイプライン分解と遅延評価を用いた実装法2012
- Author(s)
  篠崎隆宏
- Organizer
  日本音響学会 2012 年秋季研究発表会
- Place of Presentation
  長野市
- Year and Date
  20120919-20120921
[Presentation] 純粋関数型コンパクトデコーダHusky2 の性能評価2012
- Author(s)
  深津澪
- Organizer
  日本音響学会 2012 年秋季研究発表会
- Place of Presentation
  長野市
- Year and Date
  20120919-20120921
[Presentation] HMM Based Continuous EOG Recognition for Eye-input Speech Interface2012
- Author(s)
  房福明
- Organizer
  Interspeech 2012
- Place of Presentation
  Portland, 米国
- Year and Date
  20120909-20120913

2012 Fiscal Year Annual Research Report

長期間収録音声コーパスに基づく話者内音声変動に頑健な話者認識手法の研究

Principal Investigator

黒岩 眞吾 千葉大学, 融合科学研究科(研究院), 教授 (20333510)

Current Status of Research Progress

Reason

Research Products

[Journal Article] エネルギー変化の線形予測符号化に基づくリズム特徴量を用いた音楽印象識別2013

Author(s)

Journal Title

[Presentation] 構内アナウンス環境下における音声認識のための音声区間検出法2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 発話の重なりを考慮した話者インデキシング手法2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 複数分析窓長を用いたAutoencoderに基づく残響除去の検討2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] FPGA実装による小型低消費電力単語検出手法の比較検討2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 調音速度を用いた母語識別における発話内容の影響2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 母音の無声化頻度と話者の出身地に関する考察2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] AWA Long-Term Recording Speech Corpus (AWA-LTR)2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 話者認識技術の現状と課題2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] コミュニケーション支援のための眼球動作入力音声合成インタフェースの研究2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 認識対象区間を考慮した音声からの印象認識2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Foreign accent identification using articulation rate of Japanese read speech2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Pipeline Decomposition of Speech Decoders and Their Implementation Based on Delayed Evaluation2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Open Answer Scoring for S-CAT Automated Speaking Test System Using Support Vector Regression2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 母音による性別識別性能の比較2012

Author(s)

Organizer

Place of Presentation

Year and Date

黒岩眞吾千葉大学, 融合科学研究科(研究院), 教授 (20333510)