聞き手モデルに基づく能動的音声合成に関する研究

Research Project

Project/Area Number	18J22090
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	齋藤佑樹東京大学, 情報理工学系研究科, 特別研究員(DC1)
Project Period (FY)	2018-04-25 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥2,500,000 (Direct Cost: ¥2,500,000) Fiscal Year 2020: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2019: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2018: ¥900,000 (Direct Cost: ¥900,000)
Keywords	音声合成 / 声質変換 / 深層学習
Outline of Annual Research Achievements	本研究課題では、人間の音声知覚を統計的にモデル化し、多様な音声を自在に生成・制御可能な音声合成技術の実現を目指している。具体的には、音声合成技術により生成される合成音声の高品質化に加え、所望の合成音声を生成するために用いる、音声合成に対する補助的な入力（例えば、音声の話者を表す特徴量）の解釈性の低さといった従来技術の問題点を解決する。このような技術は、音声バーチャルリアリティによる身体的制約を超えた自己表現の拡張や、実際に利用される環境に適応可能な音声合成技術の実現に応用できると考える。今年度は特に、①利用者の主観的印象のグラフ表現を用いた話者ベクトル学習、 ②主観的印象スコア収集と話者ベクトル学習を反復するactive learningの2つに取り組んだ。①では、複数話者間の知覚的な類似度という関係性をグラフで表現し、深層学習に基づくグラフ表現学習により話者を表す特徴量（話者ベクトル）を学習する手法を提案した。実験的評価により、グラフ学習により得られた話者ベクトルが合成音声の自然性改善に最も有効であることを示した。この研究成果は、日本音響学会粟屋潔学術奨励賞を受賞している。②では、話者間類似度の知覚評価と話者ベクトル学習を反復し、解釈しやすい話者ベクトルを少ない計算コスト・評価コストで学習する手法を提案した。課題遂行最終年度である本年度は、研究成果の総括も行った。これまでの研究成果をまとめた原著論文は、音声信号処理分野におけるフラッグシップ論文誌であるIEEE/ACM TASLP誌に採録された。さらに、本研究課題の研究成果を含めて作成した博士論文は非常に高く評価され、東京大学大学院情報理工学研究科において、各専攻から最も優れた博士課程学生を1名ずつ選出して授与される研究科長賞を受賞した。
Research Progress Status	令和2年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和2年度が最終年度であるため、記入しない。

Report

(3 results)

Research Products
(13 results)

All 2021 2020 2019 2018

All Journal Article (4 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 4 results, Open Access: 4 results) Presentation (9 results) (of which Int'l Joint Research: 2 results)

[Journal Article] Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification2021
- Author(s)
  Saito Yuki、Nakamura Taiki、Ijima Yusuke、Nishida Kyosuke、Takamichi Shinnosuke
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 42 Issue: 1 Pages: 1-11
- DOI
  10.1250/ast.42.1
- NAID
  130007965442
- ISSN
  0369-4232, 1346-3969, 1347-5177
- Year and Date
  2021-01-01
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling2021
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 29 Pages: 1033-1048
- DOI
  10.1109/taslp.2021.3059114
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams2020
- Author(s)
  SAITO Yuki、AKUZAWA Kei、TACHIBANA Kentaro
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E103.D Issue: 9 Pages: 1978-1987
- DOI
  10.1587/transinf.2019EDP7297
- NAID
  130007894624
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2020-09-01
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra2019
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
- Journal Title
  
  Computer Speech & Language
  
  Volume: 58 Pages: 347-363
- DOI
  10.1016/j.csl.2019.05.008
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] 主観的話者間類似度を考慮したDNN話者埋め込みのためのActive Learning2021
- Author(s)
  齋藤佑樹, 高道慎之介, 猿渡洋
- Organizer
  情報処理学会音声言語情報処理研究会
- Related Report
  2020 Annual Research Report
[Presentation] 主観的話者間類似度のグラフ埋め込みに基づくDNN話者埋め込み2020
- Author(s)
  齋藤佑樹, 高道慎之介, 猿渡洋
- Organizer
  日本音響学会 2020年秋季研究発表会
- Related Report
  2020 Annual Research Report
[Presentation] SMASHコーパス：ゲーム動画の後付け実況解説音声収録に基づく自発発話音声コーパス2020
- Author(s)
  齋藤佑樹, 高道慎之介, 猿渡洋
- Organizer
  日本音響学会 2020年春季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] DNN-based speaker embedding using subjective inter-speaker similarity for multi-speaker modeling in speech synthesis2019
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
- Organizer
  The 10th ISCA Speech Synthesis Workshop (SSW)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] 音素事後確率を用いた多対一音声変換のための音声認識・生成モデルの同時敵対学習2019
- Author(s)
  齋藤佑樹，阿久澤圭, 橘健太郎
- Organizer
  日本音響学会 2019年秋季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] 主観的話者間類似度に基づくDNN話者埋め込みを用いた多数話者DNN音声合成の実験的評価2019
- Author(s)
  齋藤佑樹, 高道慎之介, 猿渡洋
- Organizer
  日本音響学会 2019年秋季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] 音素事後確率とd-vectorを用いたノンパラレル多対多VAE音声変換における学習データ量とd-vector次元数に関する評価2019
- Author(s)
  中村泰貴，齋藤佑樹，西田京介，井島勇祐，高道慎之介
- Organizer
  日本音響学会 2019年春季研究発表会
- Related Report
  2018 Annual Research Report
[Presentation] DNN音声合成に向けた主観的話者間類似度を考慮したDNN話者埋め込み2019
- Author(s)
  齋藤佑樹，高道慎之介，猿渡洋
- Organizer
  日本音響学会 2019年春季研究発表会
- Related Report
  2018 Annual Research Report
[Presentation] Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors2018
- Author(s)
  Yuki Saito, Yusuke Ijima, Kyosuke Nishida, and Shinnosuke Takamichi
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research

聞き手モデルに基づく能動的音声合成に関する研究

Principal Investigator

齋藤 佑樹 東京大学, 情報理工学系研究科, 特別研究員(DC1)

¥2,500,000 (Direct Cost: ¥2,500,000)

Report

Research Products

[Journal Article] Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification2021

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams2020

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra2019

Author(s)

Journal Title

DOI

Related Report

[Presentation] 主観的話者間類似度を考慮したDNN話者埋め込みのためのActive Learning2021

Author(s)

Organizer

Related Report

[Presentation] 主観的話者間類似度のグラフ埋め込みに基づくDNN話者埋め込み2020

Author(s)

Organizer

Related Report

[Presentation] SMASHコーパス：ゲーム動画の後付け実況解説音声収録に基づく自発発話音声コーパス2020

Author(s)

Organizer

Related Report

[Presentation] DNN-based speaker embedding using subjective inter-speaker similarity for multi-speaker modeling in speech synthesis2019

Author(s)

Organizer

Related Report

[Presentation] 音素事後確率を用いた多対一音声変換のための音声認識・生成モデルの同時敵対学習2019

Author(s)

Organizer

Related Report

[Presentation] 主観的話者間類似度に基づくDNN話者埋め込みを用いた多数話者DNN音声合成の実験的評価2019

Author(s)

Organizer

Related Report

[Presentation] 音素事後確率とd-vectorを用いたノンパラレル多対多VAE音声変換における学習データ量とd-vector次元数に関する評価2019

Author(s)

Organizer

Related Report

[Presentation] DNN音声合成に向けた主観的話者間類似度を考慮したDNN話者埋め込み2019

Author(s)

Organizer

Related Report

[Presentation] Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors2018

Author(s)

Organizer

Related Report

齋藤佑樹東京大学, 情報理工学系研究科, 特別研究員(DC1)