Multi-lingual multi-speaker voice conversion system by non-parallel learning method

Research Project

Project/Area Number	20H04207
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Japan Advanced Institute of Science and Technology
Principal Investigator	Akagi Masato 北陸先端科学技術大学院大学, 先端科学技術研究科, 名誉教授 (20242571)
Co-Investigator(Kenkyū-buntansha)	鵜木祐史北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (00343187)
Project Period (FY)	2020-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥17,420,000 (Direct Cost: ¥13,400,000、Indirect Cost: ¥4,020,000) Fiscal Year 2023: ¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2022: ¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2021: ¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2020: ¥5,330,000 (Direct Cost: ¥4,100,000、Indirect Cost: ¥1,230,000)
Keywords	パラ言語情報 / 非言語情報 / 音声変換 / 非並行型学習 / 個人性
Outline of Research at the Start	本研究では，音声変換（VC）による話者性操作を目指して，(1) 多言語間でのVCのための非並行型学習法の提案，(2) この学習法にもとづいた多数話者間の属性変換システムの構築を行う。具体的な課題は，(a) VCのSource言語とTarget言語が異なる場合の話者情報表現，(b) 誰でも話者となりえるシステムとするための多話者対多話者属性変換，(c) 未学習話者の使用を想定した場合の話者特徴の記述法，(d) 変換後の合成音声の品質・了解度の保証，である。これらの課題すべてを深層学習の枠組みで検討したうえで，適切な目的関数を設定することにより全体を最適化することを試みる。
Outline of Final Research Achievements	This study aims to enhance paralinguistic and non-linguistic information in multilingual speech through Voice Conversion (VC), with the manipulation of speaker identity in speech as one of its central objectives. To achieve this, we propose a non-parallel learning method for cross-lingual VC and explore the construction of a multi-speaker attribute conversion system based on this learning approach. Specifically, the issues addressed include (A) handling speaker information when the source and target languages of VC are different, (B) achieving multi-speaker-to-multi-speaker attribute conversion, (C) describing speaker characteristics when considering the use of unseen speakers, and (D) ensuring the quality and intelligibility of synthesized speech after conversion. By addressing these challenges within the framework of deep learning and optimizing the entire process through appropriate objective functions, we attempt to achieve comprehensive optimization.
Academic Significance and Societal Importance of the Research Achievements	話者のパラ言語および非言語情報を抽出し合成音声に付加することができる音声-音声翻訳のための多言語間音声変換システムを開発するために，その第一歩として，非言語情報の一つである話者属性（性別，年齢，声質等）の自由な変換操作を目指して，多言語間での音声変換のための非並行型学習法を提案し，これにもとづいた変換システムを検討する。これにより，ある言語で話をした話者の声と同じ声質で別の言語の音声を合成できる，しかも使用言語および使用話者を選ばないシステムの構築が可能となり，入力音声に含まれる話者属性を出力音声でも維持できることで，コミュニケーションの質を向上させることができる。

Report

(5 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Annual Research Report
2021 Annual Research Report
2020 Annual Research Report

Research Products
(17 results)

All 2023 2022 2021 2020

All Journal Article (10 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 10 results, Open Access: 1 results) Presentation (7 results) (of which Invited: 1 results)

[Journal Article] Increasing Speech Intelligibility by Mimicking Professional Announcers’ Voices and Its Physical Correlates2023
- Author(s)
  Dung Kim Tran, Masato Akagi, and Masashi Unoki
- Journal Title
  
  Proc APSIPA2023
  
  Volume: - Pages: 1162-1167
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Relationship Between Speakers’ Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network2022
- Author(s)
  Kai Li, Xugang Lu, Masato Akagi, Jianwu Dang, Sheng Li, Masashi Unoki
- Journal Title
  
  Proc. EUSIPCO2022
  
  Volume: - Pages: 379-383
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion2022
- Author(s)
  Tuan Vu Ho, Maori Kobayashi, Masato Akagi
- Journal Title
  
  Proc. Interspeech2022
  
  Volume: -
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection2022
- Author(s)
  Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki
- Journal Title
  
  Proc. Interspeech2022
  
  Volume: -
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Increasing speech intelligibility in noise based on concepts of modulation spectrum and voice conversion to professional announcer voice2022
- Author(s)
  Masato Akagi
- Journal Title
  
  Proc. of the 24th International Congress on Acoustics
  
  Volume: -
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] Deep Hashing for Speaker Identification and Retrieval Based on Auditory Sparse Representation2022
- Author(s)
  Dung Kim Tran, Masato Akagi, and Masashi Unoki
- Journal Title
  
  Proc. APSIPA2022
  
  Volume: - Pages: 938-944
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] $F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model2021
- Author(s)
  Li Yongwei、Tao Jianhua、Erickson Donna、Liu Bin、Akagi Masato
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 29 Pages: 3375-3383
- DOI
  10.1109/taslp.2021.3120585
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Study on Simultaneous Estimation of Glottal Source and Vocal Tract Parameters by ARMAX-LF Model for Speech Analysis/Synthesis2021
- Author(s)
  Kai Li, Masashi Unoki, Yongwei Li, Jianwu Dang, Masato Akagi
- Journal Title
  
  Proceeding of APSIPA2021
  
  Volume: - Pages: 36-43
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network2021
- Author(s)
  Ho Tuan Vu、Akagi Masato
- Journal Title
  
  IEEE Access
  
  Volume: 9 Pages: 47503-47515
- DOI
  10.1109/access.2021.3063519
- NAID
  120007003859
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder2020
- Author(s)
  Ho Tuan Vu、Akagi Masato
- Journal Title
  
  Proceeding of Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020
  
  Volume: - Pages: 140-144
- NAID
  120006952244
- Related Report
  2020 Annual Research Report
- Peer Reviewed
[Presentation] 確実に情報を伝える音声避難誘導システムの構築に向けて2023
- Author(s)
  赤木正人
- Organizer
  日本音響学会音声研究会
- Related Report
  2022 Annual Research Report
[Presentation] Increasing Speech Intelligibility for Evacuation Guidance by Mimicking Professional Announcers’ Voice: Discussion on Speech Intelligibility and Its Physical Correlates2023
- Author(s)
  Kimdung Tran, Masato Akagi and Masashi Unoki
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2022 Annual Research Report
[Presentation] 音声変形による雑音残響環境での音声了解度向上2023
- Author(s)
  赤木正人
- Organizer
  日本音響学会2023年度春季研究発表会
- Related Report
  2022 Annual Research Report
- Invited
[Presentation] Estimation of Glottal Source Parameters of the LF Model Using Feed-forward Neural Network2022
- Author(s)
  Kai Li, Masato Akagi, Masashi Unoki
- Organizer
  日本音響学会令和4年春季大会
- Related Report
  2021 Annual Research Report
[Presentation] Improving spectral detail and F0 modelling for VAE-based cross-lingual voice conversion with adversarial training2021
- Author(s)
  Tuan Vu Ho and Masato Akagi
- Organizer
  ASJ '2021 Spring Meeting
- Related Report
  2020 Annual Research Report
[Presentation] Estimation of Glottal Source Waveforms and Vocal Tract Shapes Based on ARMAX-LF Model2021
- Author(s)
  Kai Li, Yongwei Li, Jianwu Dang, Masashi Unoki, and Masato Akagi
- Organizer
  ASJ '2021 Spring Meeting
- Related Report
  2020 Annual Research Report
[Presentation] Cross-lingual voice conversion with Multi-codebook Hierarchical Vector-Quantized Variational Autoencoder2020
- Author(s)
  Tuan Vu Ho and Masato Akagi
- Organizer
  ASJ '2020 Fall Meeting
- Related Report
  2020 Annual Research Report

Multi-lingual multi-speaker voice conversion system by non-parallel learning method

Principal Investigator

Akagi Masato 北陸先端科学技術大学院大学, 先端科学技術研究科, 名誉教授 (20242571)

¥17,420,000 (Direct Cost: ¥13,400,000、Indirect Cost: ¥4,020,000)

Report

Research Products

[Journal Article] Increasing Speech Intelligibility by Mimicking Professional Announcers’ Voices and Its Physical Correlates2023

Author(s)

Journal Title

Related Report

[Journal Article] Relationship Between Speakers’ Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network2022

Author(s)

Journal Title

Related Report

[Journal Article] Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion2022

Author(s)

Journal Title

Related Report

[Journal Article] Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection2022

Author(s)

Journal Title

Related Report

[Journal Article] Increasing speech intelligibility in noise based on concepts of modulation spectrum and voice conversion to professional announcer voice2022

Author(s)

Journal Title

Related Report

[Journal Article] Deep Hashing for Speaker Identification and Retrieval Based on Auditory Sparse Representation2022

Author(s)

Journal Title

Related Report

[Journal Article] $F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Study on Simultaneous Estimation of Glottal Source and Vocal Tract Parameters by ARMAX-LF Model for Speech Analysis/Synthesis2021

Author(s)

Journal Title

Related Report

[Journal Article] Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network2021

Author(s)

Journal Title

DOI

NAID

Related Report

[Journal Article] Non-parallel Voice Conversion based on Hierarchical Latent Embedding Vector Quantized Variational Autoencoder2020

Author(s)

Journal Title

NAID

Related Report

[Presentation] 確実に情報を伝える音声避難誘導システムの構築に向けて2023

Author(s)

Organizer

Related Report

[Presentation] Increasing Speech Intelligibility for Evacuation Guidance by Mimicking Professional Announcers’ Voice: Discussion on Speech Intelligibility and Its Physical Correlates2023

Author(s)

Organizer

Related Report

[Presentation] 音声変形による雑音残響環境での音声了解度向上2023

Author(s)

Organizer

Related Report

[Presentation] Estimation of Glottal Source Parameters of the LF Model Using Feed-forward Neural Network2022

Author(s)

Organizer

Related Report

[Presentation] Improving spectral detail and F0 modelling for VAE-based cross-lingual voice conversion with adversarial training2021

Author(s)

Organizer

Related Report

[Presentation] Estimation of Glottal Source Waveforms and Vocal Tract Shapes Based on ARMAX-LF Model2021

Author(s)

Organizer

Related Report

[Presentation] Cross-lingual voice conversion with Multi-codebook Hierarchical Vector-Quantized Variational Autoencoder2020

Author(s)

Organizer

Related Report