2023 Fiscal Year Annual Research Report

意図を的確に伝える音声対話翻訳の基盤技術の創出

Research Project

Project/Area Number	23H03454
Allocation Type	Single-year Grants
Research Institution	Kyoto University
Principal Investigator	チョシンキ京都大学, 情報学研究科, 特定准教授 (70784891)
Co-Investigator(Kenkyū-buntansha)	李勝国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)
Project Period (FY)	2023-04-01 – 2027-03-31
Keywords	感情音声認識 / 音声翻訳 / メタ介入
Outline of Annual Research Achievements	本研究は「音声対話翻訳」という新しい機械翻訳のパラダイムを切り拓き、その基盤技術を創出する。音声対話翻訳では、意図を的確に伝えるように韻律を活用し、話者の属性と感情を保つ音声から音声への翻訳を行う。また、多言語対話全体を観察し、対話が意図通りに進んでいない時にシステムが話者らにその旨を提示し発話の修正等を促す。それによって、信頼性の高い多言語対話の支援が実現される。令和5年度では以下の研究開発を行った。 1. 音声認識と話者性別推定の事前訓練による感情音声認識モデルの改善。感情音声認識の2段階fine-tuning手法において、音声認識を用いて自己教師あり学習モデルを事前訓練することによって言語情報を学習させる。また、音声認識と話者性別推定の事前訓練の組みあせてについても調査した。この成果はINTERSPEECH 2023で発表した。 2. 音声対話翻訳コーパスおよびシステムの構築。ビジネスシーン対話対訳コーパスに日英の音声をそれぞれYahoo!クラウドソーシング、Amazon Mechanical Turkを使って性別と出身地とともに付与し、音声対話翻訳コーパスを構築した。このコーパスを用いて、音声認識で原言語の音声をテキストに書き起こし、機械翻訳で原言語のテキストを目的言語のテキストへ翻訳する枠組みを用いて音声対話翻訳システムを構築した。この成果はACL 2023で発表した。 3. 多言語対話メタ観察および介入の実現に向けて、単言語対話での齟齬の定式化およびメタ介入による解消の研究を行った。対話における齟齬は重要な現象であるが、どのような現象であるかは明確でない。対話の齟齬について、Clark の言語使用に関する理論をもとにして整理し、また齟齬の解消には第三者による介入が役立つことを実験的に示した。この成果はNLP 2024で発表した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 話者属性推定および話者属性を捉えた感情音声認識を令和5年度の研究内容として予定していた。話者属性推定による感情音声認識モデルの性能向上を検証できている。また、音声対話翻訳コーパスおよびシステムの構築ができているため、来年度の研究に向けて準備ができている。
Strategy for Future Research Activity	令和6年度は以下の項目を研究する予定である。 1. 話者属性を捉えた感情音声認識。事前学習言語モデルの利用：音声感情認識のボトルネックはデータの希少さと思われる。日本語・英語感情音声データそれぞれでの予備実験の結果、精度が60%程度に留まる。日本語データで正解の書き起こしデータに対して事前学習言語モデルBERTを用いて感情認識の予備実験を行ったところ精度が80%程度に上がることも確認できている。そこで、テキストの大規模事前学習モデルを音声感情認識に利活用させる研究を行う。 2. 感情を捉えた機械翻訳。2.1) 感情を考慮した翻訳評価セット作成：感情を捉えた機械翻訳を評価するには感情によって意味が異なる評価セットの作成が必要になる。翻訳の曖昧性に着目した映像付きマルチモーダル機械翻訳データセットの構築研究を行っており、そのノウハウを活かして評価セットを作成していく。2.2) 感情テキスト翻訳モデル開発：ドメイン適応、多言語機械翻訳で得られた知見をベースに感情を捉えた機械翻訳モデルを開発する。 3. 話者属性・感情付き音声合成。話者属性・感情付き音声合成の同時学習：話者属性、感情音声のデータでそれぞれのモデルを学習してカスケードのシステムから始め、これまでの音声合成でのノウハウを活用し、話者属性・感情音声合成の同時学習研究を行う。

Research Products
(15 results)

All 2024 2023 Other

All Int'l Joint Research (1 results) Journal Article (3 results) (of which Peer Reviewed: 3 results, Open Access: 1 results) Presentation (10 results) (of which Int'l Joint Research: 7 results) Remarks (1 results)

[Int'l Joint Research] Nanyang Technological University(シンガポール)
- Country Name
  SINGAPORE
- Counterpart Institution
  Nanyang Technological University
[Journal Article] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024
- Author(s)
  Song Haiyue、Mao Zhuoyuan、Dabre Raj、Chu Chenhui、Kurohashi Sadao
- Journal Title
  
  Journal of Natural Language Processing
  
  Volume: 31 Pages: 155～188
- DOI
  10.5715/jnlp.31.155
- Peer Reviewed / Open Access
[Journal Article] Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings2023
- Author(s)
  Soky Kak、Li Sheng、Chu Chenhui、Kawahara Tatsuya
- Journal Title
  
  International Journal of Asian Language Processing
  
  Volume: 33 Pages: 2350024
- DOI
  10.1142/S2717554523500248
- Peer Reviewed
[Journal Article] SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation2023
- Author(s)
  Song Haiyue、Dabre Raj、Chu Chenhui、Kurohashi Sadao、Sumita Eiichiro
- Journal Title
  
  ACM Transactions on Asian and Low-Resource Language Information Processing
  
  Volume: 22 Pages: 1～24
- DOI
  10.1145/3610611
- Peer Reviewed
[Presentation] Combining Large Language Model with Speech Recognition System in Low-resource Settings2024
- Author(s)
  Sheng Li, Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Chen Chen, Eng Siong Chng, Hisashi Kawai
- Organizer
  言語処理学会第30回年次大会
[Presentation] 対話の齟齬と介入による解消：LLM を用いた検討2024
- Author(s)
  清水周一郎, Yin Jou Huang, 村脇有吾, Chenhui Chu
- Organizer
  言語処理学会第30回年次大会
[Presentation] Investigating Effective Methods for Combining Large Language Model with Speech Recognition System2024
- Author(s)
  Sheng Li, Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Hisashi Kawai
- Organizer
  日本音響学会第151回(2024年春季)研究発表会
[Presentation] Video-Helpful Multimodal Machine Translation2023
- Author(s)
  Yihang Li, Shuichiro Shimizu, Chenhui Chu, Sadao Kurohashi, Wei Li
- Organizer
  In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). pp.4281-4299
- Int'l Joint Research
[Presentation] Two-stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining2023
- Author(s)
  Yuan Gao, Chenhui Chu, Tatsuya Kawahara
- Organizer
  Interspeech 2023. pp.3637-364
- Int'l Joint Research
[Presentation] Kyoto Speech-to-Speech Translation System for IWSLT 20232023
- Author(s)
  Zhengdong Yang, Shuichiro Shimizu, Zhou Wangjin, Sheng Li, Chenhui Chu
- Organizer
  In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023). pp.357-362
- Int'l Joint Research
[Presentation] Towards Speech Dialogue Translation Mediating Speakers of Different Languages2023
- Author(s)
  Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi
- Organizer
  In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023): Findings Volume. pp.1122-1134
- Int'l Joint Research
[Presentation] Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-based Speech Recognition of Low-resource Language2023
- Author(s)
  Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara
- Organizer
  In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)
- Int'l Joint Research
[Presentation] Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition2023
- Author(s)
  Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Chenhui Chu, Sadao Kurohashi
- Organizer
  In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)
- Int'l Joint Research
[Presentation] KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis2023
- Author(s)
  Wangjin Zhou, Zhengdong Yang, Sheng Li, Chenhui Chu
- Organizer
  In Proceedings of ACM Multimedia Asia Workshop of Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental)
- Int'l Joint Research
[Remarks] https://researchmap.jp/chu/

2023 Fiscal Year Annual Research Report

意図を的確に伝える音声対話翻訳の基盤技術の創出

Principal Investigator

チョ シンキ 京都大学, 情報学研究科, 特定准教授 (70784891)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] Nanyang Technological University(シンガポール)

Country Name

Counterpart Institution

[Journal Article] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024

Author(s)

Journal Title

DOI

[Journal Article] Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings2023

Author(s)

Journal Title

DOI

[Journal Article] SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation2023

Author(s)

Journal Title

DOI

[Presentation] Combining Large Language Model with Speech Recognition System in Low-resource Settings2024

Author(s)

Organizer

[Presentation] 対話の齟齬と介入による解消：LLM を用いた検討2024

Author(s)

Organizer

[Presentation] Investigating Effective Methods for Combining Large Language Model with Speech Recognition System2024

Author(s)

Organizer

[Presentation] Video-Helpful Multimodal Machine Translation2023

Author(s)

Organizer

[Presentation] Two-stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining2023

Author(s)

Organizer

[Presentation] Kyoto Speech-to-Speech Translation System for IWSLT 20232023

Author(s)

Organizer

[Presentation] Towards Speech Dialogue Translation Mediating Speakers of Different Languages2023

Author(s)

Organizer

[Presentation] Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-based Speech Recognition of Low-resource Language2023

Author(s)

Organizer

[Presentation] Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition2023

Author(s)

Organizer

[Presentation] KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis2023

Author(s)

Organizer

[Remarks] https://researchmap.jp/chu/

チョシンキ京都大学, 情報学研究科, 特定准教授 (70784891)