• 研究課題をさがす
  • 研究者をさがす
  • KAKENの使い方
  1. 前のページに戻る

事前学習と多言語意味表現学習を統合した低資源機械翻訳

研究課題

研究課題/領域番号 22KJ1843
補助金の研究課題番号 22J13719 (2022)
研究種目

特別研究員奨励費

配分区分基金 (2023)
補助金 (2022)
応募区分国内
審査区分 小区分61030:知能情報学関連
研究機関京都大学

研究代表者

毛 卓遠  京都大学, 情報学研究科, 特別研究員(DC2)

研究期間 (年度) 2023-03-08 – 2024-03-31
研究課題ステータス 完了 (2023年度)
配分額 *注記
1,700千円 (直接経費: 1,700千円)
2023年度: 800千円 (直接経費: 800千円)
2022年度: 900千円 (直接経費: 900千円)
キーワードlow-resource translation / sentence embedding / multilingual translation / multilingual embedding / model efficiency
研究開始時の研究の概要

With globalization's progress, the demand for automatic multilingual language understanding and translation increases dramatically in many scenes.
We aim to tackle the technical barriers in low-resource machine translation (LMT) and design a robust multilingual translation system that supports a large number of the languages, including several low-resource languages.
(low-resource language: languages that we do not have sufficient data resources to conduct the translation model training)

研究実績の概要

In the last fiscal year, we developed a state-of-the-art lightweight sentence embedding model, LEALLA. With this pre-trained sentence-level semantic model, new parallel corpora could be constructed more efficiently using this pre-trained sentence embedding model. We also analyzed the Transformer model architecture for low-resource translation and published a paper to the top conference. Finally, we packed up all the work into a thesis.
In general, this research embarks on a comprehensive exploration of multilingual representation learning, especially for low-resource translation, addressing the three identified challenges within this domain:
(1) To address the high computational demand accompanying the expansion of multilingual model language coverage, we proposed an efficient and effective multilingual sentence embedding (MSE) model. We also introduced a new knowledge distillation method for training lightweight MSE.
(2) To tackle the challenge of data scarcity in low-resource languages, we proposed new pre-training objectives for low-resource NMT. Additionally, we introduced word-level contrastive learning for low-resource NMT utilizing statistical word alignments. We also introduced AlignInstruct to enhance translation accuracy in low-resource languages for large language models.
(3) To address the limitations in Transformer architecture for zero-shot NMT, we initially proposed a new Transformer architecture that constructs interlingual representations on top of the Transformer encoder. We also comprehensively examined the effects of layer normalization in zero-shot NMT.

報告書

(2件)
  • 2023 実績報告書
  • 2022 実績報告書
  • 研究成果

    (19件)

すべて 2024 2023 2022

すべて 雑誌論文 (3件) (うち国際共著 2件、 査読あり 3件、 オープンアクセス 2件) 学会発表 (13件) (うち国際学会 12件) 学会・シンポジウム開催 (3件)

  • [雑誌論文] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024

    • 著者名/発表者名
      Song Haiyue、Mao Zhuoyuan、Dabre Raj、Chu Chenhui、Kurohashi Sadao
    • 雑誌名

      自然言語処理

      巻: 31 号: 1 ページ: 155-188

    • DOI

      10.5715/jnlp.31.155

    • ISSN
      1340-7619, 2185-8314
    • 関連する報告書
      2023 実績報告書
    • 査読あり / オープンアクセス
  • [雑誌論文] Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation2022

    • 著者名/発表者名
      Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi
    • 雑誌名

      ACM Transactions on Asian and Low-Resource Language Information Processing

      巻: Vol. 21, Issue 4, 68 号: 4 ページ: 1-29

    • DOI

      10.1145/3491065

    • 関連する報告書
      2022 実績報告書
    • 査読あり / オープンアクセス / 国際共著
  • [雑誌論文] SCTB-V2: the 2nd Version of the Chinese Treebank in the Scientific Domain2022

    • 著者名/発表者名
      Chenhui Chu, Zhuoyuan Mao, Toshiaki Nakazawa, Daisuke Kawahara and Sadao Kurohashi
    • 雑誌名

      Language Resources and Evaluation

      巻: Oct. 2022 号: 3 ページ: 1-15

    • DOI

      10.1007/s10579-022-09615-2

    • 関連する報告書
      2022 実績報告書
    • 査読あり / 国際共著
  • [学会発表] GPT-RE: In-context Learning for Relation Extraction using Large Language Models2023

    • 著者名/発表者名
      Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li and Sadao Kurohashi
    • 学会等名
      Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
    • 関連する報告書
      2023 実績報告書
    • 国際学会
  • [学会発表] Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation2023

    • 著者名/発表者名
      Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu and Sadao Kurohashi
    • 学会等名
      Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
    • 関連する報告書
      2023 実績報告書
    • 国際学会
  • [学会発表] Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation.2023

    • 著者名/発表者名
      Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu and Sadao Kurohashi
    • 学会等名
      Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation)
    • 関連する報告書
      2023 実績報告書
    • 国際学会
  • [学会発表] LEALLA: Learning Lightweight Language-agnostic Sentence Embedding with Knowledge Distillation2023

    • 著者名/発表者名
      Zhuoyuan Mao and Tetsuji Nakagawa
    • 学会等名
      Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)
    • 関連する報告書
      2023 実績報告書
    • 国際学会
  • [学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision2023

    • 著者名/発表者名
      Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi
    • 学会等名
      Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023): Findings Volume
    • 関連する報告書
      2023 実績報告書
    • 国際学会
  • [学会発表] When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?2022

    • 著者名/発表者名
      Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan and Sadao Kurohashi
    • 学会等名
      Findings of the Association for Computational Linguistics: NAACL 2022
    • 関連する報告書
      2022 実績報告書
    • 国際学会
  • [学会発表] Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems2022

    • 著者名/発表者名
      Yibin Shen, Qianying Liu, Zhuoyuan Mao, Zhen Wan, Fei Cheng and Sadao Kurohashi
    • 学会等名
      Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing
    • 関連する報告書
      2022 実績報告書
    • 国際学会
  • [学会発表] BERTSeg: BERT Based Unsupervised Subword Segmentation for Neural Machine Translation2022

    • 著者名/発表者名
      Haiyue Song, Raj Dabre, Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi
    • 学会等名
      Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing
    • 関連する報告書
      2022 実績報告書
    • 国際学会
  • [学会発表] Textual Enhanced Contrastive Learning for Solving Math Word Problems2022

    • 著者名/発表者名
      Yibin Shen, Qianying Liu, Zhuoyuan Mao, Fei Cheng and Sadao Kurohashi
    • 学会等名
      Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
    • 関連する報告書
      2022 実績報告書
    • 国際学会
  • [学会発表] Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation Extraction2022

    • 著者名/発表者名
      Zhen Wan, Qianying Liu, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi and Jiwei Li
    • 学会等名
      Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
    • 関連する報告書
      2022 実績報告書
    • 国際学会
  • [学会発表] LEALLA: Learning Lightweight Language-agnostic Sentence Embedding with Knowledge Distillation2022

    • 著者名/発表者名
      Zhuoyuan Mao and Tetsuji Nakagawa
    • 学会等名
      Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
    • 関連する報告書
      2022 実績報告書
    • 国際学会
  • [学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision2022

    • 著者名/発表者名
      Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi
    • 学会等名
      Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
    • 関連する報告書
      2022 実績報告書
    • 国際学会
  • [学会発表] Efficiently Learning Multilingual Sentence Representation for Cross-lingual Sentence Classification2022

    • 著者名/発表者名
      Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi
    • 学会等名
      言語処理学会 第29回年次大会
    • 関連する報告書
      2022 実績報告書
  • [学会・シンポジウム開催] Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)2023

    • 関連する報告書
      2023 実績報告書
  • [学会・シンポジウム開催] The 24th Annual Conference of The European Association for Machine Translation2023

    • 関連する報告書
      2023 実績報告書
  • [学会・シンポジウム開催] Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing2022

    • 関連する報告書
      2022 実績報告書

URL: 

公開日: 2022-04-28   更新日: 2024-12-25  

サービス概要 検索マニュアル よくある質問 お知らせ 利用規程 科研費による研究の帰属

Powered by NII kakenhi