• 研究課題をさがす
  • 研究者をさがす
  • KAKENの使い方
  1. 課題ページに戻る

2023 年度 実績報告書

事前学習と多言語意味表現学習を統合した低資源機械翻訳

研究課題

研究課題/領域番号 22KJ1843
配分区分基金
研究機関京都大学

研究代表者

毛 卓遠  京都大学, 情報学研究科, 特別研究員(DC2)

研究期間 (年度) 2023-03-08 – 2024-03-31
キーワードlow-resource translation / sentence embedding
研究実績の概要

In the last fiscal year, we developed a state-of-the-art lightweight sentence embedding model, LEALLA. With this pre-trained sentence-level semantic model, new parallel corpora could be constructed more efficiently using this pre-trained sentence embedding model. We also analyzed the Transformer model architecture for low-resource translation and published a paper to the top conference. Finally, we packed up all the work into a thesis.
In general, this research embarks on a comprehensive exploration of multilingual representation learning, especially for low-resource translation, addressing the three identified challenges within this domain:
(1) To address the high computational demand accompanying the expansion of multilingual model language coverage, we proposed an efficient and effective multilingual sentence embedding (MSE) model. We also introduced a new knowledge distillation method for training lightweight MSE.
(2) To tackle the challenge of data scarcity in low-resource languages, we proposed new pre-training objectives for low-resource NMT. Additionally, we introduced word-level contrastive learning for low-resource NMT utilizing statistical word alignments. We also introduced AlignInstruct to enhance translation accuracy in low-resource languages for large language models.
(3) To address the limitations in Transformer architecture for zero-shot NMT, we initially proposed a new Transformer architecture that constructs interlingual representations on top of the Transformer encoder. We also comprehensively examined the effects of layer normalization in zero-shot NMT.

  • 研究成果

    (8件)

すべて 2024 2023

すべて 雑誌論文 (1件) (うち査読あり 1件、 オープンアクセス 1件) 学会発表 (5件) (うち国際学会 5件) 学会・シンポジウム開催 (2件)

  • [雑誌論文] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024

    • 著者名/発表者名
      Haiyue Song, Zhuoyuan Mao, Raj Dabre, Chenhui Chu and Sadao Kurohashi
    • 雑誌名

      Journal of Natural Language Processing

      巻: Volume 31 Issue 1 ページ: 155-188

    • DOI

      10.5715/jnlp.31.155

    • 査読あり / オープンアクセス
  • [学会発表] GPT-RE: In-context Learning for Relation Extraction using Large Language Models2023

    • 著者名/発表者名
      Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li and Sadao Kurohashi
    • 学会等名
      Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
    • 国際学会
  • [学会発表] Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation2023

    • 著者名/発表者名
      Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu and Sadao Kurohashi
    • 学会等名
      Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
    • 国際学会
  • [学会発表] Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation.2023

    • 著者名/発表者名
      Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu and Sadao Kurohashi
    • 学会等名
      Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation)
    • 国際学会
  • [学会発表] LEALLA: Learning Lightweight Language-agnostic Sentence Embedding with Knowledge Distillation2023

    • 著者名/発表者名
      Zhuoyuan Mao and Tetsuji Nakagawa
    • 学会等名
      Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)
    • 国際学会
  • [学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision2023

    • 著者名/発表者名
      Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi
    • 学会等名
      Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023): Findings Volume
    • 国際学会
  • [学会・シンポジウム開催] Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)2023

  • [学会・シンポジウム開催] The 24th Annual Conference of The European Association for Machine Translation2023

URL: 

公開日: 2024-12-25  

サービス概要 検索マニュアル よくある質問 お知らせ 利用規程 科研費による研究の帰属

Powered by NII kakenhi