• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2023 Fiscal Year Annual Research Report

Developing Low-Resource Multilingual Machine Speech Chain for Breaking Language Barriers

Research Project

Project/Area Number 21H03467
Allocation TypeSingle-year Grants
Research InstitutionJapan Advanced Institute of Science and Technology

Principal Investigator

SAKTI Sakriani  北陸先端科学技術大学院大学, 先端科学技術研究科, 准教授 (00395005)

Co-Investigator(Kenkyū-buntansha) 中村 哲  奈良先端科学技術大学院大学, 先端科学技術研究科, 教授 (30263429)
Project Period (FY) 2021-04-01 – 2026-03-31
Keywords低資源音声技術 / 多言語音声認識 / 多言語音声合成 / 音声翻訳 / Machine Speech Chain
Outline of Annual Research Achievements

新型コロナ感染等による危機管理のグローバル化や、大型国際イベントの開催など、海外からの居住者および観光客との言葉の壁は深刻な問題となっている。いくつかの音声翻訳サービスが実用化されているが、高精度の翻訳性能を実現するために、広範な音声と対応する書き起こしデータを使用する教師あり学習ディープラーニングに基づいた音声翻訳の開発が必須である。本研究では、人間の言語習得プロセス、特にSpeech Chain メカニズムに基づいて、多言語の言語習得のための新しいディープラーニングの教師なしおよび半教師あり学習メカニズムを提案する。本研究で以下の課題を構成して取り組む。課題1:人間の言語処理および認知に関する文献調査および検証、課題2:リソースの少ない言語の音声およびテキストデータの収集、課題3:多言語Machine Speech Chainフレームワークの開発、「話しながら聞いて多言語を学ぶ」を実行する(オフライン半教師あり学習)、課題4:多言語Machine Speech Chainフレームワークの改善、リアルタイム学習(オフラインとオンライン学習)を実行する、課題5:多言語Machine Speech Chainフレームワークの改善、自己Lifelong学習(オンライン学習)を実行する、課題6:多言語Machine Speech Chainフレームワーク内に機械翻訳を組み込む、課題7:音声翻訳のため、多言語Machine Speech Chainフレームワークの開発、「話しながら聞いて翻訳する」を実行す
る(オフライン半教師あり学習とオンライン自己Lifelong学習)。R5年度までは、課題1-5:多言語Machine Speech Chainフレームワークの改善の開発しました。合計で、3つの招待講演(基調講演)、10つの査読付き国際会議論文、7つの国内会議論文を行った。

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

R5年度までは、計画どおり、課題1-5:多言語Machine Speech Chainフレームワークの改善の開発しました。 リアルタイム学習(オンライン学習)Machine Speech Chainの研究に関連して、リアルタイムで状況に適応できる機械の発話連鎖フレームワークの性能を更に向上させることに成功しまして、IEEE ICASSPというトップ国際会議で発表された。さらに、来年の目標である課題6:多言語Machine Speech Chainフレームワーク内に機械翻訳を組み込む開始しまして、国際会議での出版も達成できた。しかし、異なる多言語の大量の作業は、アノテーションされた音声データの不足のため困難です。そのため、未知の未翻訳言語に対処するためにビジュアルグラウンディングモデルのアプローチを提案しました。また、多言語システムのためにインドネシアの大学及びベトナムの研究所とも連携して、国際会議に論文を投稿した。

Strategy for Future Research Activity

R6年度では、以下の課題に取り組みます。
課題3:多言語Machine Speech Chainフレームワーク「話しながら聞いて多言語を学ぶ」(オフライン半教師あり学習)の開発に関しては、実験を継続します。特に、新しい教師なし機械Machine Speech Chainの可能性を調査します。
課題4:多言語Machine Speech Chainフレームワークの改善とリアルタイム学習(オフラインとオンライン学習)の開発に関しては、インクリメンタルMachine Speech Chainの実験を継続します。特にMachine Speech Chainの性能を改善し、さらに多くの言語を調査します。
課題5:多言語Machine Speech Chainフレームワークの改善と自己Lifelong学習(オンライン学習)に関しては、実験を継続します。
課題6:多言語Machine Speech Chainフレームワーク内に機械翻訳を組み込みます。
特に課題6に焦点を当てつつ、課題3から5をさらに強化し続けます。

  • Research Products

    (39 results)

All 2023 Other

All Int'l Joint Research (2 results) Journal Article (17 results) (of which Int'l Joint Research: 4 results,  Peer Reviewed: 10 results,  Open Access: 5 results) Presentation (20 results) (of which Int'l Joint Research: 12 results,  Invited: 3 results)

  • [Int'l Joint Research] Bandung Institute of Technology/University of Indonesia(インドネシア)

    • Country Name
      INDONESIA
    • Counterpart Institution
      Bandung Institute of Technology/University of Indonesia
  • [Int'l Joint Research] Institute of Information Technology(ベトナム)

    • Country Name
      VIET NAM
    • Counterpart Institution
      Institute of Information Technology
  • [Journal Article] Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task2023

    • Author(s)
      Sakti Sakriani, Titalim Benita Angela
    • Journal Title

      Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

      Volume: Vol. 1 Pages: 1314-1321

    • DOI

      10.1109/ASRU57964.2023.10389730

    • Peer Reviewed
  • [Journal Article] Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian2023

    • Author(s)
      Widiaputri Ruhiyah, Purwarianti Ayu, Lestari Dessi, Azizah Kurniawati, Tanaya Dipta、Sakti Sakriani
    • Journal Title

      Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

      Volume: Vol. 1 Pages: 16813-16824

    • DOI

      10.18653/v1/2023.emnlp-main.1045

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models2023

    • Author(s)
      Ika Hartanti Bella Septina, Tanaya Dipta, Azizah Kurniawati, Lestari Dessi Puji、Purwarianti Ayu、Sakti Sakriani
    • Journal Title

      Proceeding of the Conference of the Oriental COCOSDA

      Volume: Vol. 1 Pages: 1-6

    • DOI

      10.1109/O-COCOSDA60357.2023.10482965

    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation2023

    • Author(s)
      Xi Hang, Sakti Sakriani
    • Journal Title

      Proceeding of the Conference of the Oriental COCOSDA

      Volume: Vol. 1 Pages: 1-6

    • DOI

      10.1109/O-COCOSDA60357.2023.10482968

    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework2023

    • Author(s)
      Tran Chung, Luong Chi Mai, Sakti Sakriani
    • Journal Title

      Proceedings of the INTERSPEECH

      Volume: Vol. 1 Pages: 4464-4468

    • DOI

      10.21437/Interspeech.2023-2243

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams2023

    • Author(s)
      Takahashi Shun, Sakti Sakriani
    • Journal Title

      Proceedings of the INTERSPEECH

      Volume: Vol. 1 Pages: 416-420

    • DOI

      10.21437/Interspeech.2023-1321

    • Peer Reviewed / Open Access
  • [Journal Article] Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning2023

    • Author(s)
      Tran Tu Dinh, Sakti Sakriani
    • Journal Title

      Proceedings of the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)

      Volume: Vol. 1 Pages: 78-82

    • DOI

      10.21437/SIGUL.2023-17

    • Peer Reviewed / Open Access
  • [Journal Article] VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models2023

    • Author(s)
      Nguyen Luan Thanh, Sakti Sakriani
    • Journal Title

      Proceedings of the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)

      Volume: Vol. 1 Pages: 53-57

    • DOI

      10.21437/SIGUL.2023-12

    • Peer Reviewed / Open Access
  • [Journal Article] An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework2023

    • Author(s)
      Chen Jianan, Sakti Sakriani
    • Journal Title

      Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

      Volume: Vol. 1 Pages: 1-5

    • DOI

      10.1109/ICASSP49357.2023.10095119

    • Peer Reviewed
  • [Journal Article] Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition2023

    • Author(s)
      Novitasari Sashi、Sakti Sakriani、Nakamura Satoshi
    • Journal Title

      Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

      Volume: Vol. 1 Pages: 1-5

    • DOI

      10.1109/ICASSP49357.2023.10096128

    • Peer Reviewed
  • [Journal Article] Investigation of Cross-Lingual Mismatch in Low-resource ASR for Indonesian Ethnic Languages2023

    • Author(s)
      Sakti Sakriani, Titalim Benita Angela
    • Journal Title

      Proceedings of the ASJ Spring Meeting

      Volume: Vol. 1 Pages: 761-762

  • [Journal Article] Maintaining Personal Styles in Multilingual TTS with STEN Approach in Diffusion Framework2023

    • Author(s)
      Tran Chung, Luong Chi Mai, Sakti Sakriani
    • Journal Title

      Proceedings of the ASJ Spring Meeting

      Volume: Vol. 1 Pages: 775-776

  • [Journal Article] Non-Parallel Limited Data Emotion Voice Conversion with Variance Adapter and Non-Autoregressive Decoder2023

    • Author(s)
      Zhang Zhanhang, Sakti Sakriani
    • Journal Title

      Proceedings of the ASJ Spring Meeting

      Volume: Vol. 1 Pages: 1013-1014

  • [Journal Article] Deep Sequential Generative Modeling for Unsupervised Learning of Linguistic Representations from Speech Streams2023

    • Author(s)
      Takahashi Shun, Sakti Sakriani
    • Journal Title

      Proceedings of the ASJ Spring Meeting

      Volume: Vol. 1 Pages: 825-826

  • [Journal Article] Perceived Challenges in Simultaneous Japanese-English Translation2023

    • Author(s)
      Xi Hang, Sakti Sakriani
    • Journal Title

      Proceedings of the ASJ Spring Meeting

      Volume: Vol. 1 Pages: 827-828

  • [Journal Article] Utilizing Self-Supervised Visually Grounded Speech Models for Aligning Unpaired and Untranscribed Bilingual Speech2023

    • Author(s)
      Nguyen Luan Thanh, Sakti Sakriani
    • Journal Title

      Proceedings of the ASJ Spring Meeting

      Volume: Vol. 1 Pages: 829-830

  • [Journal Article] Generating Textual Prosody based on ASR2023

    • Author(s)
      Liu Mingxi, Sakti Sakriani
    • Journal Title

      Proceedings of the ASJ Spring Meeting

      Volume: Vol. 1 Pages: 831-832

  • [Presentation] Communicative Intelligent Systems towards Society 5.02023

    • Author(s)
      Sakti Sakriani
    • Organizer
      Sarasehan Nasional Pendidikan Tinggi Informatika dan Pemberian Tribute kepada Penggagas dan Pendidik Senior Teknik Informatika ITB
    • Invited
  • [Presentation] Language Technology for All: From the indigenous community perspectives2023

    • Author(s)
      Sakti Sakriani
    • Organizer
      Data, Technologies and Benchmarks for the Spoken Languages of the World" Meeting, IEEE SLT
    • Int'l Joint Research / Invited
  • [Presentation] Language Technology for All: From the technology and indigenous community perspectives2023

    • Author(s)
      Sakti Sakriani
    • Organizer
      the 25th Conference of the Oriental COCOSDA
    • Int'l Joint Research / Invited
  • [Presentation] Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task2023

    • Author(s)
      Titalim Benita Angela
    • Organizer
      IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
    • Int'l Joint Research
  • [Presentation] Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian2023

    • Author(s)
      Widiaputri Ruhiyah
    • Organizer
      the Conference on Empirical Methods in Natural Language Processing (EMNLP)
    • Int'l Joint Research
  • [Presentation] Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models2023

    • Author(s)
      Ika Hartanti Bella Septina、Sakti Sakriani
    • Organizer
      the Oriental COCOSDA
    • Int'l Joint Research
  • [Presentation] Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation2023

    • Author(s)
      Xi Hang、Sakti Sakriani
    • Organizer
      the Oriental COCOSDA
    • Int'l Joint Research
  • [Presentation] STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework2023

    • Author(s)
      Tran Chung, Sakti Sakriani
    • Organizer
      INTERSPEECH
    • Int'l Joint Research
  • [Presentation] Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams2023

    • Author(s)
      Takahashi Shun、Sakti Sakriani
    • Organizer
      INTERSPEECH
    • Int'l Joint Research
  • [Presentation] Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning2023

    • Author(s)
      Tran Tu Dinh、Sakti Sakriani
    • Organizer
      the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
    • Int'l Joint Research
  • [Presentation] VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models2023

    • Author(s)
      Nguyen Luan Thanh、Sakti Sakriani
    • Organizer
      the INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL)
    • Int'l Joint Research
  • [Presentation] An Isotropy Analysis for Self-Supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework2023

    • Author(s)
      Chen Jianan、Sakti Sakriani
    • Organizer
      the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Int'l Joint Research
  • [Presentation] Self-Adaptive Incremental Machine Speech Chain for Lombard TTS with High-Granularity ASR Feedback in Dynamic Noise Condition2023

    • Author(s)
      Novitasari Sashi、Sakti Sakriani、Nakamura Satoshi
    • Organizer
      the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Int'l Joint Research
  • [Presentation] Investigation of Cross-Lingual Mismatch in Low-resource ASR for Indonesian Ethnic Languages2023

    • Author(s)
      Benita Angela Titalim
    • Organizer
      the ASJ Spring Meeting
  • [Presentation] Maintaining Personal Styles in Multilingual TTS with STEN Approach in Diffusion Framework2023

    • Author(s)
      Tran Chung
    • Organizer
      the ASJ Spring Meeting
  • [Presentation] Non-Parallel Limited Data Emotion Voice Conversion with Variance Adapter and Non-Autoregressive Decoder2023

    • Author(s)
      Zhang Zhanhang
    • Organizer
      the ASJ Spring Meeting
  • [Presentation] Deep Sequential Generative Modeling for Unsupervised Learning of Linguistic Representations from Speech Streams2023

    • Author(s)
      Takahashi Shun
    • Organizer
      the ASJ Spring Meeting
  • [Presentation] Perceived Challenges in Simultaneous Japanese-English Translation2023

    • Author(s)
      Xi Hang
    • Organizer
      the ASJ Spring Meeting
  • [Presentation] Utilizing Self-Supervised Visually Grounded Speech Models for Aligning Unpaired and Untranscribed Bilingual Speech2023

    • Author(s)
      Sakti Sakriani
    • Organizer
      the ASJ Spring Meeting
  • [Presentation] Generating Textual Prosody based on ASR2023

    • Author(s)
      Liu Mingxi
    • Organizer
      the ASJ Spring Meeting

URL: 

Published: 2024-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi