Implementation of super low-delay voice conversion system

Research Project

Project/Area Number	19K20295
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Nagoya University
Principal Investigator	Kobayashi Kazuhiro 名古屋大学, 情報基盤センター, 研究員 (50815602)
Project Period (FY)	2019-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2021: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2019: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords	音声変換 / 話者 / 深層学習 / リアルタイム / 主観評価実験 / フレーム化処理 / 超短遅延音声変換 / 短遅延 / 電気式人工喉頭
Outline of Research at the Start	音声変換は，入力話者から目標話者へと音声の話者性を変換する技術である．更に，短遅延変換法と組み合わせる事で，入力音声を逐次的に変換するリアルタイム音声変換が実現可能である．一方で，リアルタイム音声変換は，変換音声の品質が著しく劣化する事が知られている．深層学習を用いた音声変換法は，この問題を解決する方法として期待されているが，計算量の増加などにより遅延量が大きくなる傾向が知られている．本研究課題では，深層学習による超短遅延音声波形生成法を用いた音声変換の実現を目指す．さらに，超短遅延音声変換によるフィードバックが入力話者に与える影響を調査し，本手法の可用性を明らかにする．
Outline of Final Research Achievements	Voice conversion is a technique to convert one's speech into another speaker's speech.　It is possible to implement a low-delay voice conversion system combined with streaming conversion techniques.　However, since there is a tradeoff between delay and conversion quality, it has been observed that setting a slight delay tends to degrade conversion quality. To alleviate these problems, we aimed to implement low-latency voice conversion systems using parallel or non-parallel utterances in this research.
Academic Significance and Societal Importance of the Research Achievements	音声変換技術は、人と人のコミュニケーションで使われる音声を対象とした技術である。声優などの卓越した話者を除き、個人が発話可能な声色の表現範囲は狭く、多くの人にとって他者の声色を完全に模倣する事は困難である。音声変換技術は、声色の表現範囲の壁を超え、誰もが多種多様な声色で発話する事を可能とする技術として期待されている。とりわけ、短遅延音声変換は入力された音声を逐次的に変換できるため、人と人とのコミュニケーションを大きく拡張する事が期待されている。一方で、高品質かつ短遅延な音声変換は未だ困難であるため、その実現に向けた研究成果や知見は重要であると考えられる。

Report

(4 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report
2019 Research-status Report

Research Products
(7 results)

All 2022 2021 2020 Other

All Presentation (5 results) (of which Int'l Joint Research: 5 results) Remarks (2 results)

[Presentation] An investigation of streaming non-autoregressive sequence-to-sequence voice conversion2022
- Author(s)
  T. Hayashi, K. Kobayashi, T. Toda
- Organizer
  IEEE ICASSP
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder2021
- Author(s)
  K. Kobayashi, W.-C. Huang, Y.-C. Wu, P.L. Tobing, T. Hayashi, T. Toda
- Organizer
  IEEE ICASSP
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Non-autoregressive sequence-to-sequence voice conversion2021
- Author(s)
  T. Hayashi, W.-C. Huang, K. Kobayashi, T. Toda
- Organizer
  IEEE ICASSP
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] CRANK: AN OPEN-SOURCE SOFTWARE FOR NONPARALLEL VOICE CONVERSION BASED ON VECTOR-QUANTIZED VARIATIONAL AUTOENCODER2021
- Author(s)
  Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda
- Organizer
  Proc. IEEE ICASSP
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN2020
- Author(s)
  K. Kobayashi, T. Toda
- Organizer
  Proc. EUSIPCO
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Remarks] 名古屋大学　大学院　情報学研究科　戸田研究室
- URL
  https://www.toda.is.i.nagoya-u.ac.jp/publications_FY2022.html
- Related Report
  2021 Annual Research Report
[Remarks] crank
- URL
  https://github.com/k2kobayashi/crank
- Related Report
  2020 Research-status Report

Implementation of super low-delay voice conversion system

Principal Investigator

Kobayashi Kazuhiro 名古屋大学, 情報基盤センター, 研究員 (50815602)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Presentation] An investigation of streaming non-autoregressive sequence-to-sequence voice conversion2022

Author(s)

Organizer

Related Report

[Presentation] Crank: an open-source software for nonparallel voice conversion based on vector-quantized variational autoencoder2021

Author(s)

Organizer

Related Report

[Presentation] Non-autoregressive sequence-to-sequence voice conversion2021

Author(s)

Organizer

Related Report

[Presentation] CRANK: AN OPEN-SOURCE SOFTWARE FOR NONPARALLEL VOICE CONVERSION BASED ON VECTOR-QUANTIZED VARIATIONAL AUTOENCODER2021

Author(s)

Organizer

Related Report

[Presentation] Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN2020

Author(s)

Organizer

Related Report

[Remarks] 名古屋大学 大学院 情報学研究科 戸田研究室

URL

Related Report

[Remarks] crank

URL

Related Report

[Remarks] 名古屋大学　大学院　情報学研究科　戸田研究室