2016 Fiscal Year Annual Research Report

Development of high-level speech processing infrastructure based on static representations of auditory information

Research Project

Project/Area Number	15H02726
Research Institution	Wakayama University
Principal Investigator	河原英紀和歌山大学, 学内共同利用施設等, 名誉教授 (40294300)
Co-Investigator(Kenkyū-buntansha)	西村竜一和歌山大学, システム工学部, 助教 (00379611) 松井淑恵和歌山大学, システム工学部, 助教 (10510034) 坂野秀樹名城大学, 理工学部, 准教授 (20335003) 入野俊夫和歌山大学, システム工学部, 教授 (20346331) 森勢将雅山梨大学, 総合研究部, 助教 (60510013) 榊原健一北海道医療大学, リハビリテーション科学部, 准教授 (80396168) 戸田智基名古屋大学, 情報基盤センター, 教授 (90403328)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	音声分析 / 音声合成 / 感情音声 / 歌唱音声 / 障害音声 / 音声情報表現
Outline of Annual Research Achievements	[研究目的]最先端の音声研究推進のための技術基盤とそれを用いたツール群の整備を目的とする。 [研究実施計画]平成28年度は、本課題の基盤としていた音声の情報表現において画期的な進展が得られた。具体的には、当初計画では想定していなかった、英国Googleの音声研究グループとの協力から生み出されたSTRAIGHTを代替できる新しい音声分析合成技術YANG vocoderがきっかけとなり、更に新しく理論的にも見通しの良い音声分析方法の発見が生じたのである。YANG vocoderでは音源情報の分析部分が、(1)周期性検出、(2)基本波成分追跡、(3)基本周波数適応時間軸伸縮による基本周波数推定値の改良と非周期成分推定の三段階に構造化されている。このような構造化により問題設定が明確になったことと、別の課題の追求で発明した新しい余弦級数の非常に有用な性質を組み合わせることにより、瞬時周波数と非周期成分の新しい計算方法の発明に至った。この計算法は、wavelet変換と組み合わせることにより、第一段階の周期性検出に応用され、Fourier変換と組み合わせることにより、第三段階の推定地の改良と非周期成分の推定に適用できる。さらに、それぞれに効率の良いアルゴリズムを開発することにより、実時間性の要求される対話的環境への応用と、大規模な音声コーパスの分析に適した実用性の高いものとすることができた。これらの一部は、国内の会議で報告することにより成果の早期での社会還元を進めるとともに、影響力の大きい旗艦国際会議への投稿を行なっている。また、実際の応用を促進するために、MATLABによるコードの公開も進めた。また、これらの情報表現と透明性の高い操作方法を開発するための基盤として、音声研究支援環境であるSparkNGの開発を進め、これもMATLABによるソースコードとして公開している。
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason 本課題の提案時に想定していた静的情報表現は、平成28年度の新しい情報表現の発見によりより優れた表現によって置き換えられるに至った。これは、当初に想定していた研究の流れの延長には無い飛躍的な進歩であり、本課題の目的をより高い立場において達成することにつながる。しかも、これまでの本課題の推進によって応用展開のための基盤が整備されていることから、最終年度においてこれらを統合したかたちで社会に還元する道筋も見えている。また、当初計画に上げていたWORLDならびにその構成技術であるD4Cも、オープンアクセスの論文として刊行した。これらの状況から本研究課題は当初計画以上に進展していると判断する。
Strategy for Future Research Activity	最終年度にあたり、平成28年度に新たに発見された情報表現に基づく体系化という形に本課題の成果を再構成することを最重要目標として推進する。具体的な全体構想は当初計画と同様であるが、その基盤となる部分をより高度なものに置き換えることになる。これまでに蓄積してきた応用基盤やデータは、ほぼそのまま利用することができるため、この変更に伴う問題は生じない。

Research Products
(19 results)

All 2017 2016 Other

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 2 results, Acknowledgement Compliant: 2 results) Presentation (14 results) (of which Int'l Joint Research: 6 results, Invited: 1 results) Remarks (3 results)

[Journal Article] D4C, a band-aperiodicity estimator for high-quality speech synthesis2016
- Author(s)
  Masanori Morise
- Journal Title
  
  Speech Communication
  
  Volume: 84 Pages: 57-65
- DOI
  10.1016/j.specom.2016.09.001
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] WORLD: a vocoder-based high-quality speech synthesis system for real-time applications2016
- Author(s)
  Masanori Morise, Fumiya Yokomori, Kenji Ozawa
- Journal Title
  
  IEICE transactions on information and systems
  
  Volume: E99-D Pages: 1877-1884
- DOI
  10.1587/transinf.2015EDP7457
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Presentation] Aliasing-free Fujisaki-Ljungqvist model and its application to voice quality perception2017
- Author(s)
  Hideki Kawahara, Minoru Tsuzaki, Toshie Matsui, Toshio Irino, Ken-Ichi Sakakibara
- Organizer
  日本音響学会聴覚研究会
- Place of Presentation
  京都市立芸大、京都府京都市
- Year and Date
  2017-03-27 – 2017-03-27
[Presentation] エリアシングの無い声帯音源モデルおよび対話的音声生成シミュレータの拡張について2017
- Author(s)
  河原英紀、榊原健一
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学生田キャンパス、神奈川県川崎市
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 声帯音源波形を用いた歌唱音声の気息性表現手法に関する検討2017
- Author(s)
  伊藤雅大, 坂野秀樹, 旭健作
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学生田キャンパス、神奈川県川崎市
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 音声生成過程を考慮したWaveNetに基づく音声波形合成法2017
- Author(s)
  玉森聡, 林知樹, 戸田智基, 武田一哉
- Organizer
  電子情報通信学会／日本音響学会音声研究会
- Place of Presentation
  沖縄産業支援センター、沖縄県那覇市
- Year and Date
  2017-03-01 – 2017-03-01
[Presentation] 瞬時周波数および群遅延に基づく非周期成分推定法再考2017
- Author(s)
  河原英紀、榊原健一、森勢将雅、坂野秀樹
- Organizer
  情報処理学会音楽情報科学研究会
- Place of Presentation
  ヤマハ株式会社、静岡県浜松市
- Year and Date
  2017-02-27 – 2017-02-28
[Presentation] 高い雑音耐性と推定精度を両立する基本周波数推定法の提案と評価2016
- Author(s)
  森勢将雅
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  NTT武蔵野研究開発センタ、東京都三鷹市
- Year and Date
  2016-12-20 – 2016-12-21
[Presentation] Low delay statistical singing voice conversion with direct waveform modification based on spectral differential considering global variance2016
- Author(s)
  Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura
- Organizer
  ASA/ASJ Joint meeting
- Place of Presentation
  Honolulu, USA
- Year and Date
  2016-11-28 – 2016-12-02
- Int'l Joint Research
[Presentation] A study on acoustic feature representing breathiness of singing voice based on vocal-fold vibration modeling2016
- Author(s)
  Masahiro Itou, Hideki Banno and Kensaku Asahi
- Organizer
  ASA/ASJ Joint meeting
- Place of Presentation
  Honolulu, USA
- Year and Date
  2016-11-28 – 2016-12-02
- Int'l Joint Research
[Presentation] Realtime and interactive tools for speech and hearing science education2016
- Author(s)
  Hideki Kawahara
- Organizer
  ASA/ASJ Joint meeting
- Place of Presentation
  Honolulu, USA
- Year and Date
  2016-11-28 – 2016-12-01
- Int'l Joint Research
[Presentation] TUSK: A framework for overviewing the performance of F0 estimators2016
- Author(s)
  Masanori Morise, Hideki Kawahara
- Organizer
  Interspeech 2016
- Place of Presentation
  SanFransisco, USA
- Year and Date
  2016-09-08 – 2016-09-12
- Int'l Joint Research
[Presentation] SparkNG: Interactive MATLAB tools for introduction to speech production, perception and processing fundamentals and application of the aliasing-free LF model component2016
- Author(s)
  Hideki Kawahara, Masanori Morise, Ken-Ichi Sakakibara, Tomoki Toda, Hideki Banno Ryuichi Nisimura, Toshio Irino
- Organizer
  Interspeech 2016
- Place of Presentation
  SanFransisco, USA
- Year and Date
  2016-09-08 – 2016-09-12
- Int'l Joint Research
[Presentation] The NU-NAIST voice conversion system for the Voice Conversion Challenge 20162016
- Author(s)
  Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda
- Organizer
  Interspeech 2016
- Place of Presentation
  SanFransisco, USA
- Year and Date
  2016-09-08 – 2016-09-12
- Int'l Joint Research
[Presentation] 音声分析合成システムWORLDにより実時間音声合成を実現するための拡張と実装例2016
- Author(s)
  森勢将雅
- Organizer
  情報処理学会音楽情報科学研究会
- Place of Presentation
  東京理科大学野田キャンパス、千葉県野田市
- Year and Date
  2016-07-30 – 2016-08-01
[Presentation] 音情報処理における特徴表現2016
- Author(s)
  戸田智基
- Organizer
  情報処理学会音楽情報科学研究会音学シンポジウム2016
- Place of Presentation
  東海大学高輪キャンパス、東京都港区
- Year and Date
  2016-05-21 – 2016-05-22
- Invited
[Remarks] SparkNG: Matlab realtime speech tools
- URL
  http://www.wakayama-u.ac.jp/~kawahara/SparkNG/
[Remarks] WORLD
- URL
  http://ml.cs.yamanashi.ac.jp/world/
[Remarks] 音声分析変換合成法STRAIGHT
- URL
  http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_j.html

2016 Fiscal Year Annual Research Report

Development of high-level speech processing infrastructure based on static representations of auditory information

Principal Investigator

河原 英紀 和歌山大学, 学内共同利用施設等, 名誉教授 (40294300)

Current Status of Research Progress

Reason

Research Products

[Journal Article] D4C, a band-aperiodicity estimator for high-quality speech synthesis2016

Author(s)

Journal Title

DOI

[Journal Article] WORLD: a vocoder-based high-quality speech synthesis system for real-time applications2016

Author(s)

Journal Title

DOI

[Presentation] Aliasing-free Fujisaki-Ljungqvist model and its application to voice quality perception2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] エリアシングの無い声帯音源モデルおよび対話的音声生成シミュレータの拡張について2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 声帯音源波形を用いた歌唱音声の気息性表現手法に関する検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 音声生成過程を考慮したWaveNetに基づく音声波形合成法2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 瞬時周波数および群遅延に基づく非周期成分推定法再考2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 高い雑音耐性と推定精度を両立する基本周波数推定法の提案と評価2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Low delay statistical singing voice conversion with direct waveform modification based on spectral differential considering global variance2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A study on acoustic feature representing breathiness of singing voice based on vocal-fold vibration modeling2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Realtime and interactive tools for speech and hearing science education2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] TUSK: A framework for overviewing the performance of F0 estimators2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] SparkNG: Interactive MATLAB tools for introduction to speech production, perception and processing fundamentals and application of the aliasing-free LF model component2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] The NU-NAIST voice conversion system for the Voice Conversion Challenge 20162016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 音声分析合成システムWORLDにより実時間音声合成を実現するための拡張と実装例2016

Author(s)

Organizer

Place of Presentation

Year and Date

河原英紀和歌山大学, 学内共同利用施設等, 名誉教授 (40294300)