2016 Fiscal Year Annual Research Report

Development of augmented speech production techniques based on combination of statistical approaches and speech production modeling approaches

Research Project

Project/Area Number	26280060
Research Institution	Nagoya University
Principal Investigator	戸田智基名古屋大学, 情報基盤センター, 教授 (90403328)
Co-Investigator(Kenkyū-buntansha)	亀岡弘和日本電信電話株式会社NTTコミュニケーション科学基礎研究所, メディア情報研究部, 主任研究員/特別研究員 (20466402) 中村哲奈良先端科学技術大学院大学, 情報科学研究科, 教授 (30263429) 猿渡洋東京大学, 情報理工学(系)研究科, 教授 (30324974)
Project Period (FY)	2014-04-01 – 2017-03-31
Keywords	機能拡張 / 音声合成 / 音声変換 / 信号処理 / 統計処理
Outline of Annual Research Achievements	音声コミュニケーションにおいて、物理的・身体的制約により生じる障壁を取り除くために、現存の音声生成過程において併用可能な音声変換基礎技術、および、音声生成機能を拡張する応用技術の構築を目指し、以下の課題に取り組んだ。１．発声器官動作制御機能を備えた統計的音声変換技術の構築：調音動作操作機能を備えた統計的声質変換技術と、音源生成器官動作操作機能を備えた統計的韻律変換技術を併用することで、発声器官動作制御機能を備えた統計的音声変換技術を構築した。また、音声波形加工に基づく韻律変換手法や、高精度な音声特徴量時系列モデリング手法の改良に取り組んだ。２．音声生成機能を拡張する複数の応用技術の構築：音声生成機能拡張技術として、発声障碍者補助技術、外国語発声生成技術、体内伝導音声強調技術、ボイスチェンジャー技術を構築し、その性能を評価した。発声障碍者補助技術については、短遅延予測処理に特化した手法を提案し、その有効性を示した。外国語発声生成技術に関しては、日本語話者の英語音声に対して、個人性を保持しつつ自然性を改善する手法を提案した。体内伝導音声強調技術については、統計的体内伝導音声強調処理部の前処理として適したを外部雑音除去技術を考案した。ボイスチェンジャーに関しては、提案技術に基づく変換システムを国際的評価会Voice Conversion Challenge 2016に投稿し、参加17機関中、最高性能の評価を得るに至った。３．調音動作・音声同期収録データベースの構築：データベースの構築に向けて、前年度までに収録したデータの整備に取り組んだ。４．これらの研究成果をとりまとめ、国内外において多数の研究発表を行った。本研究成果は高い評価を受け、国内において計4つの賞を受賞するとともに、8件の招待講演を実施するに至った。
Research Progress Status	28年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	28年度が最終年度であるため、記入しない。
Causes of Carryover	28年度が最終年度であるため、記入しない。
Expenditure Plan for Carryover Budget	28年度が最終年度であるため、記入しない。

Research Products
(37 results)

All 2017 2016

All Journal Article (15 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 15 results, Acknowledgement Compliant: 14 results, Open Access: 7 results) Presentation (22 results) (of which Int'l Joint Research: 2 results, Invited: 8 results)

[Journal Article] Discriminative non-negative matrix factorization with majorization-minimization2017
- Author(s)
  Li Li, Hirokazu Kameoka, Shoji Makino
- Journal Title
  
  Proceedings of HSCMA
  
  Volume: - Pages: 141-145
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals2017
- Author(s)
  Yusuke Tajiri, Hirokazu Kameoka, Tomoki Toda
- Journal Title
  
  Proceedings of ICASSP
  
  Volume: - Pages: 4960-4964
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] はじめての音声変換2016
- Author(s)
  戸田智基
- Journal Title
  
  日本音響学会誌
  
  Volume: 72 (6) Pages: 324-331
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Post-filters to modify the modulation spectrum for statistical parametric speech synthesis2016
- Author(s)
  Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: 24 (4) Pages: 755-767
- DOI
  10.1109/TASLP.2016.2522655
- Peer Reviewed / Int'l Joint Research / Acknowledgement Compliant
[Journal Article] A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models2016
- Author(s)
  Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E99-D (10) Pages: 2490-2498
- DOI
  10.1587/transinf.2016SLP0020
- Peer Reviewed
[Journal Article] Improvements of voice timbre control based on perceived age in singing voice conversion2016
- Author(s)
  Kazuhiro Kobayashi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E99-D (11) Pages: 2767-2777
- DOI
  10.1587/transinf.2016EDP7234
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics2016
- Author(s)
  Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E99-D (12) Pages: 3132-3139
- DOI
  10.1587/transinf.2016EDP7231
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution2016
- Author(s)
  Naoki Murata, Hirokazu Kameoka, Keisuke Kinoshita, Shoko Araki, Tomohiro Nakatani, Shoichi Koyama, Hiroshi Saruwatari
- Journal Title
  
  Proceedings of EUSIPCO
  
  Volume: - Pages: 1648-1652
- DOI
  10.1109/EUSIPCO.2016.7760528
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Real-time vibration control of an electrolarynx based on statistical F0 contour prediction2016
- Author(s)
  Kou Tanaka, Tomoki Toda, Graham Neubig, Satoshi Nakamura
- Journal Title
  
  Proceedigns of EUSIPCO
  
  Volume: - Pages: 1333-1337
- DOI
  10.1109/EUSIPCO.2016.7760465
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Acoustic-to-articulatory inversion mapping based on latent trajectory Gaussian mixture model2016
- Author(s)
  Patrick Lumban Tobing, Tomoki Toda, Hirokazu Kameoka, Satoshi Nakamura
- Journal Title
  
  Proceedings of INTERSPEECH
  
  Volume: - Pages: 953-957
- DOI
  10.21437/Interspeech.2016-1196
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] The Voice Conversion Challenge 20162016
- Author(s)
  Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi
- Journal Title
  
  Proceedings of INTERSPEECH
  
  Volume: - Pages: 1632-1636
- DOI
  10.21437/Interspeech.2016-1066
- Peer Reviewed / Open Access / Int'l Joint Research / Acknowledgement Compliant
[Journal Article] The NU-NAIST voice conversion system for the Voice Conversion Challenge 20162016
- Author(s)
  Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda
- Journal Title
  
  Proceedings of INTERSPEECH
  
  Volume: - Pages: 1667-1671
- DOI
  10.21437/Interspeech.2016-970
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Semi-supervised joint enhancement of spectral and cepstral sequences of noisy speech2016
- Author(s)
  Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari
- Journal Title
  
  Proceedings of INTERSPEECH
  
  Volume: - Pages: 3753-3757
- DOI
  10.21437/Interspeech.2016-1286
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring2016
- Author(s)
  Yusuke Tajiri, Tomoki Toda
- Journal Title
  
  Proceedings of 9th ISCA Speech Synthesis Workshop (SSW9)
  
  Volume: - Pages: 54-60
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential2016
- Author(s)
  Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura
- Journal Title
  
  Proceedings of SLT
  
  Volume: - Pages: 693-700
- DOI
  10.1109/SLT.2016.7846338
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Presentation] F0パターン生成過程の確率モデルに基づく電気音声に対するフレーズ・アクセント指令推定2017
- Author(s)
  田中宏, 亀岡弘和, 戸田智基, 中村哲
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] Acoustic-to-articulatory inversion mapping with variational latent trajectory Gaussian mixture model2017
- Author(s)
  Patrick Lumban Tobing, Hirokazu Kameoka, Tomoki Toda
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 差分スペクトル補正に基づく声質変換におけるF0変換法の調査2017
- Author(s)
  小林和弘, 戸田智基, 中村哲
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 統計的音声波形変換に基づく雑音環境下における音声了解度向上2017
- Author(s)
  武山知弘, 小林和弘, 田尻祐介, 戸田智基, 武田一哉
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 音声器官動作操作機能を備えた統計的音声変換法の評価2017
- Author(s)
  伊佐衣代, Patrick Lumban Tobing, 田中宏, 戸田智基, 中村哲
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 非可聴つぶやき強調のためのセグメント特徴量正則化NTF2017
- Author(s)
  田尻祐介, 亀岡弘和, 戸田智基
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 音声信号の分析と加工 ― 音声を自在に変換するには？2017
- Author(s)
  戸田智基
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
- Invited
[Presentation] 補助関数法による識別的NMF の基底学習アルゴリズム2017
- Author(s)
  李莉, 亀岡弘和, 牧野昭二
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] Vocal Tract Spectrogram Estimation with Formant Frequency Contour Factorization2017
- Author(s)
  鄒雲漢, 李莉, 亀岡弘和
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  明治大学（神奈川県川崎市）
- Year and Date
  2017-03-15 – 2017-03-17
[Presentation] 音声変換技術の進展と課題2017
- Author(s)
  戸田智基
- Organizer
  日本音響学会東海支部総会・講演会
- Place of Presentation
  ルブラ王山（愛知県名古屋市）
- Year and Date
  2017-03-13 – 2017-03-13
- Invited
[Presentation] セグメント特徴量正則化NTFに基づく雑音環境下における非可聴つぶやき強調2017
- Author(s)
  田尻祐介, 亀岡弘和, 戸田智基
- Organizer
  電子情報通信学会／日本音響学会音声研究会
- Place of Presentation
  沖縄産業支援センター（沖縄県那覇市）
- Year and Date
  2017-03-01 – 2017-03-02
[Presentation] Acoustic-to-articulatory inversion mapping with variational latent trajectory Gaussian mixture model2017
- Author(s)
  Patrick Lumban Tobing, Hirokazu Kameoka, Tomoki Toda
- Organizer
  電子情報通信学会／日本音響学会音声研究会
- Place of Presentation
  沖縄産業支援センター（沖縄県那覇市）
- Year and Date
  2017-03-01 – 2017-03-02
[Presentation] Low delay statistical singing voice conversion with direct waveform modification based on spectral differential considering global variance2016
- Author(s)
  Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura
- Organizer
  5th Joint Meeting of the ASA and the ASJ
- Place of Presentation
  Hilton Hawaiian Village (Honolulu, Hawaii, USA)
- Year and Date
  2016-11-28 – 2016-12-02
- Int'l Joint Research
[Presentation] Evaluation of electrolarynx controlled by real-time statistical F0 prediction2016
- Author(s)
  Kou Tanaka, Tomoki Toda, Satoshi Nakamura
- Organizer
  5th Joint Meeting of the ASA and the ASJ
- Place of Presentation
  Hilton Hawaiian Village (Honolulu, Hawaii, USA)
- Year and Date
  2016-11-28 – 2016-12-02
- Int'l Joint Research
[Presentation] Statistical voice conversion and its application to augmented speech production2016
- Author(s)
  Tomoki Toda
- Organizer
  名古屋工業大学情報科学フロンティア研究院特別講演会
- Place of Presentation
  名古屋工業大学（愛知県名古屋市）
- Year and Date
  2016-11-18 – 2016-11-18
- Invited
[Presentation] トラジェクトリ隠れマルコフモデルによる音声強調2016
- Author(s)
  岸田拓也, 亀岡弘和, 中島祥好
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  富山大学（富山県富山市）
- Year and Date
  2016-09-14 – 2016-09-16
[Presentation] 統計的音響信号処理2016
- Author(s)
  亀岡弘和
- Organizer
  NLP若手の会(YANS)第11回シンポジウム
- Place of Presentation
  ホテルシーモア（和歌山県西牟婁郡）
- Year and Date
  2016-08-28 – 2016-08-30
- Invited
[Presentation] 音声のスペクトル領域とケプストラム領域における同時強調2016
- Author(s)
  李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二
- Organizer
  電子情報通信学会／日本音響学会音声研究会
- Place of Presentation
  京都大学（京都府京都市）
- Year and Date
  2016-08-17 – 2016-08-17
[Presentation] 音情報処理における特徴表現2016
- Author(s)
  戸田智基
- Organizer
  MIRU2016 第19回画像の認識・理解シンポジウム, 特別企画MIRU x KIKU（音学シンポジウム連携オーガナイズドセッション）
- Place of Presentation
  アクトシティ浜松（静岡県浜松市）
- Year and Date
  2016-08-01 – 2016-08-04
- Invited
[Presentation] 音響信号の分解と再構成2016
- Author(s)
  亀岡弘和
- Organizer
  MIRU2016 第19回画像の認識・理解シンポジウム, 特別企画MIRU x KIKU（音学シンポジウム連携オーガナイズドセッション）
- Place of Presentation
  アクトシティ浜松（静岡県浜松市）
- Year and Date
  2016-08-01 – 2016-08-04
- Invited
[Presentation] 音情報処理における特徴表現2016
- Author(s)
  戸田智基
- Organizer
  情報処理学会音楽情報科学研究会音学シンポジウム2016（第111回研究発表会）, MIRU連携オーガナイズドセッション
- Place of Presentation
  東海大学高輪キャンパス（東京都港区）
- Year and Date
  2016-05-21 – 2016-05-22
- Invited
[Presentation] 音響信号の分解と再構成2016
- Author(s)
  亀岡弘和
- Organizer
  情報処理学会音楽情報科学研究会音学シンポジウム2016（第111回研究発表会）, MIRU連携オーガナイズドセッション
- Place of Presentation
  東海大学高輪キャンパス（東京都港区）
- Year and Date
  2016-05-21 – 2016-05-22
- Invited

2016 Fiscal Year Annual Research Report

Development of augmented speech production techniques based on combination of statistical approaches and speech production modeling approaches

Principal Investigator

戸田 智基 名古屋大学, 情報基盤センター, 教授 (90403328)

Research Products

[Journal Article] Discriminative non-negative matrix factorization with majorization-minimization2017

Author(s)

Journal Title

[Journal Article] A noise suppression method for body-conducted soft speech based on non-negative tensor factorization of air- and body-conducted signals2017

Author(s)

Journal Title

[Journal Article] はじめての音声変換2016

Author(s)

Journal Title

[Journal Article] Post-filters to modify the modulation spectrum for statistical parametric speech synthesis2016

Author(s)

Journal Title

DOI

[Journal Article] A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models2016

Author(s)

Journal Title

DOI

[Journal Article] Improvements of voice timbre control based on perceived age in singing voice conversion2016

Author(s)

Journal Title

DOI

[Journal Article] Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics2016

Author(s)

Journal Title

DOI

[Journal Article] Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution2016

Author(s)

Journal Title

DOI

[Journal Article] Real-time vibration control of an electrolarynx based on statistical F0 contour prediction2016

Author(s)

Journal Title

DOI

[Journal Article] Acoustic-to-articulatory inversion mapping based on latent trajectory Gaussian mixture model2016

Author(s)

Journal Title

DOI

[Journal Article] The Voice Conversion Challenge 20162016

Author(s)

Journal Title

DOI

[Journal Article] The NU-NAIST voice conversion system for the Voice Conversion Challenge 20162016

Author(s)

Journal Title

DOI

[Journal Article] Semi-supervised joint enhancement of spectral and cepstral sequences of noisy speech2016

Author(s)

Journal Title

DOI

[Journal Article] Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring2016

Author(s)

Journal Title

[Journal Article] F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential2016

Author(s)

Journal Title

DOI

[Presentation] F0パターン生成過程の確率モデルに基づく電気音声に対するフレーズ・アクセント指令推定2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Acoustic-to-articulatory inversion mapping with variational latent trajectory Gaussian mixture model2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 差分スペクトル補正に基づく声質変換におけるF0変換法の調査2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 統計的音声波形変換に基づく雑音環境下における音声了解度向上2017

Author(s)

Organizer

Place of Presentation

戸田智基名古屋大学, 情報基盤センター, 教授 (90403328)