2023 Fiscal Year Annual Research Report

Developing Low-Resource Multilingual Machine Speech Chain for Breaking Language Barriers

Research Project

Project/Area Number	21H03467
Allocation Type	Single-year Grants
Research Institution	Japan Advanced Institute of Science and Technology
Principal Investigator	SAKTI Sakriani 北陸先端科学技術大学院大学, 先端科学技術研究科, 准教授 (00395005)
Co-Investigator(Kenkyū-buntansha)	中村哲奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (30263429)
Project Period (FY)	2021-04-01 – 2026-03-31
Keywords	低資源音声技術 / 多言語音声認識 / 多言語音声合成 / 音声翻訳 / Machine Speech Chain
Outline of Annual Research Achievements	新型コロナ感染等による危機管理のグローバル化や、大型国際イベントの開催など、海外からの居住者および観光客との言葉の壁は深刻な問題となっている。いくつかの音声翻訳サービスが実用化されているが、高精度の翻訳性能を実現するために、広範な音声と対応する書き起こしデータを使用する教師あり学習ディープラーニングに基づいた音声翻訳の開発が必須である。本研究では、人間の言語習得プロセス、特にSpeech Chain メカニズムに基づいて、多言語の言語習得のための新しいディープラーニングの教師なしおよび半教師あり学習メカニズムを提案する。本研究で以下の課題を構成して取り組む。課題1：人間の言語処理および認知に関する文献調査および検証、課題2：リソースの少ない言語の音声およびテキストデータの収集、課題3：多言語Machine Speech Chainフレームワークの開発、「話しながら聞いて多言語を学ぶ」を実行する（オフライン半教師あり学習）、課題4：多言語Machine Speech Chainフレームワークの改善、リアルタイム学習(オフラインとオンライン学習)を実行する、課題5：多言語Machine Speech Chainフレームワークの改善、自己Lifelong学習（オンライン学習)を実行する、課題6：多言語Machine Speech Chainフレームワーク内に機械翻訳を組み込む、課題7：音声翻訳のため、多言語Machine Speech Chainフレームワークの開発、「話しながら聞いて翻訳する」を実行する（オフライン半教師あり学習とオンライン自己Lifelong学習）。R5年度までは、課題1-5：多言語Machine Speech Chainフレームワークの改善の開発しました。合計で、３つの招待講演（基調講演）、10つの査読付き国際会議論文、７つの国内会議論文を行った。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason R5年度までは、計画どおり、課題1-5：多言語Machine Speech Chainフレームワークの改善の開発しました。リアルタイム学習（オンライン学習）Machine Speech Chainの研究に関連して、リアルタイムで状況に適応できる機械の発話連鎖フレームワークの性能を更に向上させることに成功しまして、IEEE ICASSPというトップ国際会議で発表された。さらに、来年の目標である課題６：多言語Machine Speech Chainフレームワーク内に機械翻訳を組み込む開始しまして、国際会議での出版も達成できた。しかし、異なる多言語の大量の作業は、アノテーションされた音声データの不足のため困難です。そのため、未知の未翻訳言語に対処するためにビジュアルグラウンディングモデルのアプローチを提案しました。また、多言語システムのためにインドネシアの大学及びベトナムの研究所とも連携して、国際会議に論文を投稿した。
Strategy for Future Research Activity	R６年度では、以下の課題に取り組みます。課題3：多言語Machine Speech Chainフレームワーク「話しながら聞いて多言語を学ぶ」（オフライン半教師あり学習）の開発に関しては、実験を継続します。特に、新しい教師なし機械Machine Speech Chainの可能性を調査します。課題4：多言語Machine Speech Chainフレームワークの改善とリアルタイム学習（オフラインとオンライン学習）の開発に関しては、インクリメンタルMachine Speech Chainの実験を継続します。特にMachine Speech Chainの性能を改善し、さらに多くの言語を調査します。課題5：多言語Machine Speech Chainフレームワークの改善と自己Lifelong学習（オンライン学習）に関しては、実験を継続します。課題6：多言語Machine Speech Chainフレームワーク内に機械翻訳を組み込みます。特に課題6に焦点を当てつつ、課題3から5をさらに強化し続けます。

Research Products
(39 results)

All 2023 Other

All Int'l Joint Research (2 results) Journal Article (17 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 10 results, Open Access: 5 results) Presentation (20 results) (of which Int'l Joint Research: 12 results, Invited: 3 results)

[Int'l Joint Research] Bandung Institute of Technology/University of Indonesia(インドネシア)
- Country Name
  INDONESIA
- Counterpart Institution
  Bandung Institute of Technology/University of Indonesia
[Int'l Joint Research] Institute of Information Technology(ベトナム)
- Country Name
  VIET NAM
- Counterpart Institution
  Institute of Information Technology
[Journal Article] Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task2023
- Author(s)
  Sakti Sakriani, Titalim Benita Angela
- Journal Title
  
  Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  
  Volume: Vol. 1 Pages: 1314-1321
- DOI
  10.1109/ASRU57964.2023.10389730
- Peer Reviewed
[Journal Article] Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian2023
- Author(s)
  Widiaputri Ruhiyah, Purwarianti Ayu, Lestari Dessi, Azizah Kurniawati, Tanaya Dipta、Sakti Sakriani
- Journal Title
  
  Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
  
  Volume: Vol. 1 Pages: 16813-16824
- DOI
  10.18653/v1/2023.emnlp-main.1045
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models2023
- Author(s)
  Ika Hartanti Bella Septina, Tanaya Dipta, Azizah Kurniawati, Lestari Dessi Puji、Purwarianti Ayu、Sakti Sakriani
- Journal Title
  
  Proceeding of the Conference of the Oriental COCOSDA
  
  Volume: Vol. 1 Pages: 1-6
- DOI
  10.1109/O-COCOSDA60357.2023.10482965
- Peer Reviewed / Int'l Joint Research
[Journal Article] Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation2023
- Author(s)
  Xi Hang, Sakti Sakriani
- Journal Title
  
  Proceeding of the Conference of the Oriental COCOSDA
  
  Volume: Vol. 1 Pages: 1-6
- DOI
  10.1109/O-COCOSDA60357.2023.10482968
- Peer Reviewed / Int'l Joint Research
[Journal Article] STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework2023
- Author(s)
  Tran Chung, Luong Chi Mai, Sakti Sakriani
- Journal Title
  
  Proceedings of the INTERSPEECH
  
  Volume: Vol. 1 Pages: 4464-4468
- DOI
  10.21437/Interspeech.2023-2243
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams2023
- Author(s)
  Takahashi Shun, Sakti Sakriani
- Journal Title
  
  Proceedings of the INTERSPEECH
  
  Volume: Vol. 1 Pages: 416-420
- DOI
  10.21437/Interspeech.2023-1321
- Peer Reviewed / Open Access
[Journal Article] Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning2023
- Author(s)
  Tran Tu Dinh, Sakti Sakriani
- Journal Title
  
  Proceedings of the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
  
  Volume: Vol. 1 Pages: 78-82
- DOI
  10.21437/SIGUL.2023-17
- Peer Reviewed / Open Access
[Journal Article] VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models2023
- Author(s)
  Nguyen Luan Thanh, Sakti Sakriani
- Journal Title
  
  Proceedings of the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
  
  Volume: Vol. 1 Pages: 53-57
- DOI
  10.21437/SIGUL.2023-12
- Peer Reviewed / Open Access
[Journal Article] An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework2023
- Author(s)
  Chen Jianan, Sakti Sakriani
- Journal Title
  
  Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  
  Volume: Vol. 1 Pages: 1-5
- DOI
  10.1109/ICASSP49357.2023.10095119
- Peer Reviewed
[Journal Article] Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition2023
- Author(s)
  Novitasari Sashi、Sakti Sakriani、Nakamura Satoshi
- Journal Title
  
  Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  
  Volume: Vol. 1 Pages: 1-5
- DOI
  10.1109/ICASSP49357.2023.10096128
- Peer Reviewed
[Journal Article] Investigation of Cross-Lingual Mismatch in Low-resource ASR for Indonesian Ethnic Languages2023
- Author(s)
  Sakti Sakriani, Titalim Benita Angela
- Journal Title
  
  Proceedings of the ASJ Spring Meeting
  
  Volume: Vol. 1 Pages: 761-762
[Journal Article] Maintaining Personal Styles in Multilingual TTS with STEN Approach in Diffusion Framework2023
- Author(s)
  Tran Chung, Luong Chi Mai, Sakti Sakriani
- Journal Title
  
  Proceedings of the ASJ Spring Meeting
  
  Volume: Vol. 1 Pages: 775-776
[Journal Article] Non-Parallel Limited Data Emotion Voice Conversion with Variance Adapter and Non-Autoregressive Decoder2023
- Author(s)
  Zhang Zhanhang, Sakti Sakriani
- Journal Title
  
  Proceedings of the ASJ Spring Meeting
  
  Volume: Vol. 1 Pages: 1013-1014
[Journal Article] Deep Sequential Generative Modeling for Unsupervised Learning of Linguistic Representations from Speech Streams2023
- Author(s)
  Takahashi Shun, Sakti Sakriani
- Journal Title
  
  Proceedings of the ASJ Spring Meeting
  
  Volume: Vol. 1 Pages: 825-826
[Journal Article] Perceived Challenges in Simultaneous Japanese-English Translation2023
- Author(s)
  Xi Hang, Sakti Sakriani
- Journal Title
  
  Proceedings of the ASJ Spring Meeting
  
  Volume: Vol. 1 Pages: 827-828
[Journal Article] Utilizing Self-Supervised Visually Grounded Speech Models for Aligning Unpaired and Untranscribed Bilingual Speech2023
- Author(s)
  Nguyen Luan Thanh, Sakti Sakriani
- Journal Title
  
  Proceedings of the ASJ Spring Meeting
  
  Volume: Vol. 1 Pages: 829-830
[Journal Article] Generating Textual Prosody based on ASR2023
- Author(s)
  Liu Mingxi, Sakti Sakriani
- Journal Title
  
  Proceedings of the ASJ Spring Meeting
  
  Volume: Vol. 1 Pages: 831-832
[Presentation] Communicative Intelligent Systems towards Society 5.02023
- Author(s)
  Sakti Sakriani
- Organizer
  Sarasehan Nasional Pendidikan Tinggi Informatika dan Pemberian Tribute kepada Penggagas dan Pendidik Senior Teknik Informatika ITB
- Invited
[Presentation] Language Technology for All: From the indigenous community perspectives2023
- Author(s)
  Sakti Sakriani
- Organizer
  Data, Technologies and Benchmarks for the Spoken Languages of the World" Meeting, IEEE SLT
- Int'l Joint Research / Invited
[Presentation] Language Technology for All: From the technology and indigenous community perspectives2023
- Author(s)
  Sakti Sakriani
- Organizer
  the 25th Conference of the Oriental COCOSDA
- Int'l Joint Research / Invited
[Presentation] Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task2023
- Author(s)
  Titalim Benita Angela
- Organizer
  IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- Int'l Joint Research
[Presentation] Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian2023
- Author(s)
  Widiaputri Ruhiyah
- Organizer
  the Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Int'l Joint Research
[Presentation] Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models2023
- Author(s)
  Ika Hartanti Bella Septina、Sakti Sakriani
- Organizer
  the Oriental COCOSDA
- Int'l Joint Research
[Presentation] Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation2023
- Author(s)
  Xi Hang、Sakti Sakriani
- Organizer
  the Oriental COCOSDA
- Int'l Joint Research
[Presentation] STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework2023
- Author(s)
  Tran Chung, Sakti Sakriani
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams2023
- Author(s)
  Takahashi Shun、Sakti Sakriani
- Organizer
  INTERSPEECH
- Int'l Joint Research
[Presentation] Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning2023
- Author(s)
  Tran Tu Dinh、Sakti Sakriani
- Organizer
  the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
- Int'l Joint Research
[Presentation] VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models2023
- Author(s)
  Nguyen Luan Thanh、Sakti Sakriani
- Organizer
  the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
- Int'l Joint Research
[Presentation] An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework2023
- Author(s)
  Chen Jianan、Sakti Sakriani
- Organizer
  the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition2023
- Author(s)
  Novitasari Sashi、Sakti Sakriani、Nakamura Satoshi
- Organizer
  the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Investigation of Cross-Lingual Mismatch in Low-resource ASR for Indonesian Ethnic Languages2023
- Author(s)
  Benita Angela Titalim
- Organizer
  the ASJ Spring Meeting
[Presentation] Maintaining Personal Styles in Multilingual TTS with STEN Approach in Diffusion Framework2023
- Author(s)
  Tran Chung
- Organizer
  the ASJ Spring Meeting
[Presentation] Non-Parallel Limited Data Emotion Voice Conversion with Variance Adapter and Non-Autoregressive Decoder2023
- Author(s)
  Zhang Zhanhang
- Organizer
  the ASJ Spring Meeting
[Presentation] Deep Sequential Generative Modeling for Unsupervised Learning of Linguistic Representations from Speech Streams2023
- Author(s)
  Takahashi Shun
- Organizer
  the ASJ Spring Meeting
[Presentation] Perceived Challenges in Simultaneous Japanese-English Translation2023
- Author(s)
  Xi Hang
- Organizer
  the ASJ Spring Meeting
[Presentation] Utilizing Self-Supervised Visually Grounded Speech Models for Aligning Unpaired and Untranscribed Bilingual Speech2023
- Author(s)
  Sakti Sakriani
- Organizer
  the ASJ Spring Meeting
[Presentation] Generating Textual Prosody based on ASR2023
- Author(s)
  Liu Mingxi
- Organizer
  the ASJ Spring Meeting

2023 Fiscal Year Annual Research Report

Developing Low-Resource Multilingual Machine Speech Chain for Breaking Language Barriers

Principal Investigator

SAKTI Sakriani 北陸先端科学技術大学院大学, 先端科学技術研究科, 准教授 (00395005)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] Bandung Institute of Technology/University of Indonesia(インドネシア)

Country Name

Counterpart Institution

[Int'l Joint Research] Institute of Information Technology(ベトナム)

Country Name

Counterpart Institution

[Journal Article] Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task2023

Author(s)

Journal Title

DOI

[Journal Article] Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian2023

Author(s)

Journal Title

DOI

[Journal Article] Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models2023

Author(s)

Journal Title

DOI

[Journal Article] Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation2023

Author(s)

Journal Title

DOI

[Journal Article] STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework2023

Author(s)

Journal Title

DOI

[Journal Article] Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams2023

Author(s)

Journal Title

DOI

[Journal Article] Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning2023

Author(s)

Journal Title

DOI

[Journal Article] VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models2023

Author(s)

Journal Title

DOI

[Journal Article] An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework2023

Author(s)

Journal Title

DOI

[Journal Article] Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition2023

Author(s)

Journal Title

DOI

[Journal Article] Investigation of Cross-Lingual Mismatch in Low-resource ASR for Indonesian Ethnic Languages2023

Author(s)

Journal Title

[Journal Article] Maintaining Personal Styles in Multilingual TTS with STEN Approach in Diffusion Framework2023

Author(s)

Journal Title

[Journal Article] Non-Parallel Limited Data Emotion Voice Conversion with Variance Adapter and Non-Autoregressive Decoder2023

Author(s)

Journal Title

[Journal Article] Deep Sequential Generative Modeling for Unsupervised Learning of Linguistic Representations from Speech Streams2023

Author(s)

Journal Title

[Journal Article] Perceived Challenges in Simultaneous Japanese-English Translation2023

Author(s)

Journal Title

[Journal Article] Utilizing Self-Supervised Visually Grounded Speech Models for Aligning Unpaired and Untranscribed Bilingual Speech2023

Author(s)

Journal Title

[Journal Article] Generating Textual Prosody based on ASR2023

Author(s)

Journal Title

[Presentation] Communicative Intelligent Systems towards Society 5.02023

Author(s)

Organizer

[Presentation] Language Technology for All: From the indigenous community perspectives2023

Author(s)

Organizer