2017 Fiscal Year Annual Research Report

Development of high-level speech processing infrastructure based on static representations of auditory information

Research Project

Project/Area Number	15H02726
Research Institution	Wakayama University
Principal Investigator	河原英紀和歌山大学, 学内共同利用施設等, 名誉教授 (40294300)
Co-Investigator(Kenkyū-buntansha)	西村竜一和歌山大学, システム工学部, 助教 (00379611) 松井淑恵豊橋技術科学大学, 工学研究科, 准教授 (10510034) 坂野秀樹名城大学, 理工学部, 准教授 (20335003) 入野俊夫和歌山大学, システム工学部, 教授 (20346331) 森勢将雅山梨大学, 大学院総合研究部, 准教授 (60510013) 榊原健一北海道医療大学, リハビリテーション科学部, 准教授 (80396168) 戸田智基名古屋大学, 情報基盤センター, 教授 (90403328)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	音声分析 / 音声合成 / 感情音声 / 歌声音声 / 障害音声 / 音声情報表現
Outline of Annual Research Achievements	[研究の目的]最先端の音声研究推進のための技術基盤とそれを用いたツール群の整備を目的とする。 [研究の成果]平成29年度は、本課題の基盤としていた音声の情報表現において平成28年度に引続いて画期的な進展が得られた。音声研究の分野では、深層学習による手法が急速に普及し、特定のサービスを目的とする場合には、中間的な情報表現を介さずに応用システムを構築することが可能となっている。本課題でも深層学習に基づくWaveNetを導入することで応用システムによる合成音声の品質を大きく向上させた。しかし、その普及は音声の学術的研究に対して負の影響を与えつつある。本課題により発見／発明された音声の精密な情報表現の分析方法は、この負の影響を解決する位置付けにある。その成果である精密な音声の音源情報抽出法と、分析された音源情報に基づいて音源を生成する際の問題を解決する方法は、旗艦国際会議に採択され、高く評価されている。また、音声知覚研究の強力な研究手段として開発している、時変多属性任意事例数モーフィングを、研究コミュニティーの広い層への普及を促進するため、支援ツールを開発し配布を進めた。これらの活動が注目され、音声研究のハブとしての影響力を増しつつある、シンガポール国立大学に招かれ、国際会議の基調講演と今後の研究協力のための布石を行なった。さらに、平成29年度末には、音声研究のデファクトスタンダードとなっている我々の発明による音声処理基盤の数理的基礎を強固にする発明の萌芽が出現するに至った。このように、本課題では最終年度において、予定を大きく上回りかつ次の画期的な発展につながる成果が得られている。
Research Progress Status	29年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	29年度が最終年度であるため、記入しない。
Remarks	最初の二つは、和歌山大学に用意した、英文での情報発信用と日本語での情報発信用に用意したポータル機能を有するサイト。後の一つは、成果公開用のソフトウェアのGitHubへのリンク。

Research Products
(38 results)

All 2018 2017 Other

All Journal Article (3 results) (of which Peer Reviewed: 3 results, Open Access: 3 results) Presentation (32 results) (of which Int'l Joint Research: 14 results, Invited: 2 results) Remarks (3 results)

[Journal Article] Sound quality comparison among high-quality vocoders by using re-synthesized speech2018
- Author(s)
  Masanori Morise, Yusuke Watanabe
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 39 Pages: 263-265
- DOI
  10.1250/ast.39.263
- Peer Reviewed / Open Access
[Journal Article] Application of time-frequency representations of aperiodicity and instantaneous frequency for detailed analysis of filled pauses2017
- Author(s)
  Hideki Kawahara
- Journal Title
  
  Journal of the Phonetic Society of Japan
  
  Volume: 21 Pages: 63-73
- DOI
  10.24467/onseikenkyu.21.3_63
- Peer Reviewed / Open Access
[Journal Article] ディジタル信号処理の落とし穴2017
- Author(s)
  河原　英紀
- Journal Title
  
  日本音響学会誌
  
  Volume: 73 Pages: 62-63
- DOI
  10.20697/jasj.73.9_592
- Peer Reviewed / Open Access
[Presentation] バブル雑音重畳と強調処理された音声の模擬難聴下における了解度2018
- Author(s)
  大橋成美, 余村直子,山本克彦,荒木章子,木下慶介,中谷智広,入野俊夫
- Organizer
  電子情報通信学会音声研究会
[Presentation] 高品質音声符号化のためのスペクトル包絡・非周期性指標量子化の知覚的影響2018
- Author(s)
  宮下玄太，森勢将雅
- Organizer
  電子情報通信学会音声研究会
[Presentation] 振幅包絡歪み指標に基づくバブル雑音下の音声明瞭2018
- Author(s)
  山本克彦, 大橋成美, 入野俊夫 , 荒木章子, 木下慶介, 中谷智広
- Organizer
  日本音響学会2018年春季研究発表会
[Presentation] 低雑音レベルを含めたノッチ雑音マスキング閾値と聴覚フィルタ推定2018
- Author(s)
  横田健治, 入野俊夫 , ロイ D. パターソン
- Organizer
  日本音響学会2018年春季研究発表会
[Presentation] 対数領域パルスによる声帯音源モデルの拡張について2018
- Author(s)
  河原　英紀、榊原健一
- Organizer
  日本音響学会2018年春季研究発表会
[Presentation] Time-series evaluation of men's preferences perceived from female speech2018
- Author(s)
  T. Shono, A. Otani, M. Morise, and K. Ozawa
- Organizer
  NCSP 2018
- Int'l Joint Research
[Presentation] velvet noiseとその変種の聴覚心理・生理研究への応用可能性について2018
- Author(s)
  河原　英紀、津崎　実、坂野秀樹、森勢将雅、松井淑恵、入野俊夫
- Organizer
  日本音響学会聴覚研究会
[Presentation] Application of the velvet noise and its variant for synthetic speech and singing2018
- Author(s)
  Hideki Kawahara
- Organizer
  第118回音楽情報科学研究会
[Presentation] Incorporating absolute threshold and a cochlear noise floor into the GammaChirp model of masking2018
- Author(s)
  Toshio Irino, Kenji Yokota, Toshie Matsui, and Roy D. Patterson
- Organizer
  ARO 41st midwinter meeting
- Int'l Joint Research
[Presentation] 演技発話による疲労の表現によって生じる音色変化の分析2018
- Author(s)
  生野琢郎，森勢将雅
- Organizer
  電子情報通信学会音声研究会
[Presentation] 高品質音声分析合成による各パラメータのフレームシフト幅が音質に与える影響2018
- Author(s)
  宮下玄太，森勢将雅
- Organizer
  電子情報通信学会音声研究会
[Presentation] 音源の周期性を表す指標の実時間分析と表示について2017
- Author(s)
  河原　英紀, 　榊原　健一
- Organizer
  電子情報通信学会音声研究会
[Presentation] 変調スペクトル領域の信号対歪み比に基づく音声明瞭度予測法の提案2017
- Author(s)
  山本克彦，入野俊夫，松井淑恵, 荒木章子，木下慶介，中谷智広
- Organizer
  電子情報通信学会第32回信号処理シンポジウム
[Presentation] Accurate estimation of fo and aperiodicity based on periodicity detector residuals and deviations of phase derivatives2017
- Author(s)
  Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda
- Organizer
  APSIPA ASC 2017
- Int'l Joint Research
[Presentation] Realtime feedback of singing voice information for assisting students learning music therapy2017
- Author(s)
  Hideki Kawahara, Eri Haneishi, Kaori Hagiwara
- Organizer
  2017 International Conference on Orange Technologies
- Int'l Joint Research
[Presentation] Making speech tangible for better understanding of human speech communication2017
- Author(s)
  Hideki Kawahara
- Organizer
  The 21th International Conference on Asian Language Processing
- Int'l Joint Research / Invited
[Presentation] 分析合成音を用いた音声分析合成方式の性能比較2017
- Author(s)
  渡邊優介，森勢将雅
- Organizer
  日本音響学会2017年秋季研究発表会
[Presentation] Characterization of subharmonic voices using phase derivatives2017
- Author(s)
  Hideki Kawahara, Ken-Ichi Sakakibara
- Organizer
  Pan-European Voice Conference
- Int'l Joint Research
[Presentation] Predicting speech intelligibility using a gammachirp envelope distortion index based on the signal-to-distortion ratio2017
- Author(s)
  Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, and Tomohiro Nakatani
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] The effect of spectral tilt on size discrimination of voiced speech sounds2017
- Author(s)
  Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] An auditory model of speaker size perception for voiced speech sounds2017
- Author(s)
  Toshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] Harvest: A high-performance fundamental frequency estimator from speech signals2017
- Author(s)
  M. Morise
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation2017
- Author(s)
  Hideki Kawahara, Ken-Ichi Sakakibara and Masanori Morise and Hideki Banno and Tomoki Toda
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds2017
- Author(s)
  Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] A new cosine series antialiasing function and its application to aliasing-free glottal source models for speech and singing synthesis2017
- Author(s)
  Hideki Kawahara, K. Sakakibara, H. Banno, M. Morise, T. Toda, T. Irino
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] Low-dimensional representation of spectral envelope without deterioration for full-band speech analysis/synthesis system2017
- Author(s)
  M. Morise, G. Miyashita, and K. Ozawa
- Organizer
  Interspeech 2017
- Int'l Joint Research
[Presentation] 基本周波数再訪2017
- Author(s)
  河原　英紀, 　榊原　健一
- Organizer
  電子情報通信学会音声研究会
[Presentation] VOCODER再訪なぜ元の音声の位相を保存したくないのか2017
- Author(s)
  河原　英紀
- Organizer
  電子情報通信学会応用音響研究会
- Invited
[Presentation] 誇張した時間的揺らぎが歌声の人間性知覚に与える影響2017
- Author(s)
  森勢将雅，豊田裕一，小澤賢司
- Organizer
  第115回音楽情報科学研究会
[Presentation] 高品質音声分析合成を用いた基本周波数の実時間操作インタフェースの実装2017
- Author(s)
  渡邊優介，森勢将雅，小澤賢司
- Organizer
  第115回音楽情報科学研究会
[Presentation] 有声音の寸法知覚に対する聴覚計算モデル2017
- Author(s)
  瀧本恵理，入野俊夫，松井淑恵，Roy D. Patterson
- Organizer
  第115回音楽情報科学研究会
[Presentation] 有声音の寸法知覚における高域強調処理の影響2017
- Author(s)
  松井淑恵, 入野俊夫, 山本航大, 河原英紀, PATTERSON Roy D.
- Organizer
  第115回音楽情報科学研究会
[Remarks] Hideki Kawahara, Emeritus Professor
- URL
  http://www.wakayama-u.ac.jp/~kawahara/index-e.html
[Remarks] Hideki Kawahara, Emeritus Professor
- URL
  http://www.wakayama-u.ac.jp/~kawahara/
[Remarks] 音声分析変換合成システムWORLD
- URL
  https://github.com/mmorise/World

2017 Fiscal Year Annual Research Report

Development of high-level speech processing infrastructure based on static representations of auditory information

Principal Investigator

河原 英紀 和歌山大学, 学内共同利用施設等, 名誉教授 (40294300)

Research Products

[Journal Article] Sound quality comparison among high-quality vocoders by using re-synthesized speech2018

Author(s)

Journal Title

DOI

[Journal Article] Application of time-frequency representations of aperiodicity and instantaneous frequency for detailed analysis of filled pauses2017

Author(s)

Journal Title

DOI

[Journal Article] ディジタル信号処理の落とし穴2017

Author(s)

Journal Title

DOI

[Presentation] バブル雑音重畳と強調処理された音声の模擬難聴下における了解度2018

Author(s)

Organizer

[Presentation] 高品質音声符号化のためのスペクトル包絡・非周期性 指標量子化の知覚的影響2018

Author(s)

Organizer

[Presentation] 振幅包絡歪み指標に基づくバブル雑音下の音声明瞭2018

Author(s)

Organizer

[Presentation] 低雑音レベルを含めたノッチ雑音マスキング閾値と聴覚フィルタ推定2018

Author(s)

Organizer

[Presentation] 対数領域パルスによる声帯音源モデルの拡張について2018

Author(s)

Organizer

[Presentation] Time-series evaluation of men's preferences perceived from female speech2018

Author(s)

Organizer

[Presentation] velvet noiseとその変種の聴覚心理・生理研究への応用可能性について2018

Author(s)

Organizer

[Presentation] Application of the velvet noise and its variant for synthetic speech and singing2018

Author(s)

Organizer

[Presentation] Incorporating absolute threshold and a cochlear noise floor into the GammaChirp model of masking2018

Author(s)

Organizer

[Presentation] 演技発話による疲労の表現によって生じる音色変化の 分析2018

Author(s)

Organizer

[Presentation] 高品質音声分析合成による各パラメータのフレームシフト幅が音質に与える影響2018

Author(s)

Organizer

[Presentation] 音源の周期性を表す指標の実時間分析と表示について2017

Author(s)

Organizer

[Presentation] 変調スペクトル領域の信号対歪み比に基づく音声明瞭度予測法の提案2017

Author(s)

Organizer

[Presentation] Accurate estimation of fo and aperiodicity based on periodicity detector residuals and deviations of phase derivatives2017

Author(s)

Organizer

[Presentation] Realtime feedback of singing voice information for assisting students learning music therapy2017

Author(s)

Organizer

[Presentation] Making speech tangible for better understanding of human speech communication2017

Author(s)

Organizer

[Presentation] 分析合成音を用いた音声分析合成方式の性能比較2017

Author(s)

Organizer

[Presentation] Characterization of subharmonic voices using phase derivatives2017

Author(s)

Organizer

[Presentation] Predicting speech intelligibility using a gammachirp envelope distortion index based on the signal-to-distortion ratio2017

Author(s)

Organizer

[Presentation] The effect of spectral tilt on size discrimination of voiced speech sounds2017

Author(s)

Organizer

[Presentation] An auditory model of speaker size perception for voiced speech sounds2017

Author(s)

Organizer

河原英紀和歌山大学, 学内共同利用施設等, 名誉教授 (40294300)

[Presentation] 高品質音声符号化のためのスペクトル包絡・非周期性指標量子化の知覚的影響2018

[Presentation] 演技発話による疲労の表現によって生じる音色変化の分析2018

[Presentation] VOCODER再訪なぜ元の音声の位相を保存したくないのか2017

[Presentation] 誇張した時間的揺らぎが歌声の人間性知覚に与える影響2017

[Presentation] 高品質音声分析合成を用いた基本周波数の実時間操作インタフェースの実装2017