2015 Fiscal Year Annual Research Report

多人数における多様な会話形態に頑健な話者ダイアライゼーションに関する研究

Research Project

Project/Area Number	25330210
Research Institution	Shizuoka University
Principal Investigator	西田昌史静岡大学, 情報学部, 准教授 (80361442)
Co-Investigator(Kenkyū-buntansha)	山本誠一同志社大学, 理工学部, 教授 (20374100)
Project Period (FY)	2013-04-01 – 2016-03-31
Keywords	多人数会話 / 話者クラスタリング / 音韻性と話者性 / 発話形式 / 主成分分析 / GMM
Outline of Annual Research Achievements	本研究では、多様な発話形式に頑健な話者クラスタリングを実現するために、音声データに含まれる音韻性と話者性に着目した。音声データから音韻性と話者性を分離することができれば、音韻性を抑制することで話者性をより強調することができると考えられる。話者識別と話者照合においては、主成分分析により得られる分散が大きい空間は音韻性、分散が小さい空間は話者性を表しているとみなして、音韻性を抑制した話者空間に音声データを射影し、話者空間上でGMM(Gaussian Mixture Model)を学習する手法が提案され、有効性が示されている。しかしながら、従来の話者クラスタリング手法では音韻性と話者性の分離という観点で処理されてこなかった。また、多人数会話では発話ごとに発話時間が異なるため、発話に含まれる音韻のばらつきが話者モデルを構築する際に影響を与えると考えられる。そこで、発話ごとに発話内分散を考慮した話者空間を構築し、統計的手法であるGMMを学習することで音韻による影響を抑えた話者クラスタリング手法を提案した。日本語話し言葉コーパスに含まれる講演音声を用いて、任意の長さの無音区間で音声を区切り、複数名の話者の発話順がランダムになるように音声データを作成し、1セットあたり5名と10名からなる疑似的な討論音声データを作成した。これらの疑似的な討論音声データを用いて評価実験を行った結果、従来のBIC(Bayesian Information Criterion)に基づく手法ならびに通常のGMMに基づく手法に比べて、提案手法によりクラスタリング性能が改善され、話者数が5名ならびに10名のいずれにおいても90%以上と高い精度を得ることができた。したがって、提案手法により多様な発話形式に頑健な話者クラスタリングを実現することができた。

Research Products
(8 results)

All 2016 2015

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (7 results) (of which Int'l Joint Research: 4 results)

[Journal Article] Speech Recognition of English by Japanese using Lexicon Represented by Multiple Reduced Phoneme Sets2015
- Author(s)
  X. Wang, S. Yamamoto
- Journal Title
  
  Trans. IEICE
  
  Volume: Vol.E98-D, No. 12 Pages: 2271-2279
- DOI
  10.1587 / transinf.2015EDP7061
- Peer Reviewed / Open Access
[Presentation] 多元的音情報に基づく頑健な音声認識に関する研究2016
- Author(s)
  林升柯，西田昌史，西村雅史
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  横浜桐蔭大学（神奈川県横浜市）
- Year and Date
  2016-03-09 – 2016-03-11
[Presentation] 非侵襲簡易型身体状況認識システムに関する研究2016
- Author(s)
  安藤純平，西田昌史，西村雅史
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  横浜桐蔭大学（神奈川県横浜市）
- Year and Date
  2016-03-09 – 2016-03-11
[Presentation] Daily Activity Recognition Based on Acoustic Signals and Acceleration Signals Estimated with Gaussian Process2015
- Author(s)
  M. Nishida, N. Kitaoka, K. Takeda
- Organizer
  APSIPA
- Place of Presentation
  Hong Kong (China)
- Year and Date
  2015-12-16 – 2015-12-19
- Int'l Joint Research
[Presentation] 咽喉マイクを利用した多人数会話における発話区間推定2015
- Author(s)
  大高祥裕，西田昌史，西村雅史
- Organizer
  第13回情報学ワークショップ
- Place of Presentation
  名城大学（愛知県名古屋市）
- Year and Date
  2015-12-05
[Presentation] Quantitative analyses of Gaze Activity during Silence: Comparison between Native-language and Second-language Conversations2015
- Author(s)
  I. Umata, T. Tanizoe, K. Ijuin, S. Yamamoto
- Organizer
  EAP Cogsci
- Place of Presentation
  Torino (Italy)
- Year and Date
  2015-09-25 – 2015-09-27
- Int'l Joint Research
[Presentation] Eye Gaze Analyses in L1 and L2 Conversations: Difference in Interaction Structure2015
- Author(s)
  K. Ijuin, Y. Horiuchi, I. Umata, S. Yamamoto
- Organizer
  TSD
- Place of Presentation
  Plzen (Czech)
- Year and Date
  2015-09-14 – 2015-09-17
- Int'l Joint Research
[Presentation] Daily Activity Recognition Based on DNN Using Environmental Sound and Acceleration Signals2015
- Author(s)
  T. Hayashi, M. Nishida, N. Kitaoka, K. Takeda
- Organizer
  EUSIPCO
- Place of Presentation
  Nice (France)
- Year and Date
  2015-08-31 – 2015-09-04
- Int'l Joint Research

2015 Fiscal Year Annual Research Report

多人数における多様な会話形態に頑健な話者ダイアライゼーションに関する研究

Principal Investigator

西田 昌史 静岡大学, 情報学部, 准教授 (80361442)

Research Products

[Journal Article] Speech Recognition of English by Japanese using Lexicon Represented by Multiple Reduced Phoneme Sets2015

Author(s)

Journal Title

DOI

[Presentation] 多元的音情報に基づく頑健な音声認識に関する研究2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 非侵襲簡易型身体状況認識システムに関する研究2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Daily Activity Recognition Based on Acoustic Signals and Acceleration Signals Estimated with Gaussian Process2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 咽喉マイクを利用した多人数会話における発話区間推定2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Quantitative analyses of Gaze Activity during Silence: Comparison between Native-language and Second-language Conversations2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Eye Gaze Analyses in L1 and L2 Conversations: Difference in Interaction Structure2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Daily Activity Recognition Based on DNN Using Environmental Sound and Acceleration Signals2015

Author(s)

Organizer

Place of Presentation

Year and Date

西田昌史静岡大学, 情報学部, 准教授 (80361442)