事前学習と多言語意味表現学習を統合した低資源機械翻訳

研究課題

研究課題/領域番号	22KJ1843
補助金の研究課題番号	22J13719 (2022)
研究種目	特別研究員奨励費
配分区分	基金 (2023) 補助金 (2022)
応募区分	国内
審査区分	小区分61030:知能情報学関連
研究機関	京都大学
研究代表者	毛卓遠京都大学, 情報学研究科, 特別研究員(DC2)
研究期間 (年度)	2023-03-08 – 2024-03-31
研究課題ステータス	完了 (2023年度)
配分額 *注記	1,700千円 (直接経費: 1,700千円) 2023年度: 800千円 (直接経費: 800千円) 2022年度: 900千円 (直接経費: 900千円)
キーワード	low-resource translation / sentence embedding / multilingual translation / multilingual embedding / model efficiency
研究開始時の研究の概要	With globalization's progress, the demand for automatic multilingual language understanding and translation increases dramatically in many scenes. We aim to tackle the technical barriers in low-resource machine translation (LMT) and design a robust multilingual translation system that supports a large number of the languages, including several low-resource languages. (low-resource language: languages that we do not have sufficient data resources to conduct the translation model training)
研究実績の概要	In the last fiscal year, we developed a state-of-the-art lightweight sentence embedding model, LEALLA. With this pre-trained sentence-level semantic model, new parallel corpora could be constructed more efficiently using this pre-trained sentence embedding model. We also analyzed the Transformer model architecture for low-resource translation and published a paper to the top conference. Finally, we packed up all the work into a thesis. In general, this research embarks on a comprehensive exploration of multilingual representation learning, especially for low-resource translation, addressing the three identified challenges within this domain: (1) To address the high computational demand accompanying the expansion of multilingual model language coverage, we proposed an efficient and effective multilingual sentence embedding (MSE) model. We also introduced a new knowledge distillation method for training lightweight MSE. (2) To tackle the challenge of data scarcity in low-resource languages, we proposed new pre-training objectives for low-resource NMT. Additionally, we introduced word-level contrastive learning for low-resource NMT utilizing statistical word alignments. We also introduced AlignInstruct to enhance translation accuracy in low-resource languages for large language models. (3) To address the limitations in Transformer architecture for zero-shot NMT, we initially proposed a new Transformer architecture that constructs interlingual representations on top of the Transformer encoder. We also comprehensively examined the effects of layer normalization in zero-shot NMT.

報告書

(2件)

2023 実績報告書
2022 実績報告書

研究成果
(19件)

すべて 2024 2023 2022

すべて雑誌論文 (3件) (うち国際共著 2件、査読あり 3件、オープンアクセス 2件) 学会発表 (13件) (うち国際学会 12件) 学会・シンポジウム開催 (3件)

[雑誌論文] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024
- 著者名/発表者名
  Song Haiyue、Mao Zhuoyuan、Dabre Raj、Chu Chenhui、Kurohashi Sadao
- 雑誌名
  
  自然言語処理
  
  巻: 31 号: 1 ページ: 155-188
- DOI
  10.5715/jnlp.31.155
- ISSN
  1340-7619, 2185-8314
- 関連する報告書
  2023 実績報告書
- 査読あり / オープンアクセス
[雑誌論文] Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation2022
- 著者名/発表者名
  Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi
- 雑誌名
  
  ACM Transactions on Asian and Low-Resource Language Information Processing
  
  巻: Vol. 21, Issue 4, 68 号: 4 ページ: 1-29
- DOI
  10.1145/3491065
- 関連する報告書
  2022 実績報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] SCTB-V2: the 2nd Version of the Chinese Treebank in the Scientific Domain2022
- 著者名/発表者名
  Chenhui Chu, Zhuoyuan Mao, Toshiaki Nakazawa, Daisuke Kawahara and Sadao Kurohashi
- 雑誌名
  
  Language Resources and Evaluation
  
  巻: Oct. 2022 号: 3 ページ: 1-15
- DOI
  10.1007/s10579-022-09615-2
- 関連する報告書
  2022 実績報告書
- 査読あり / 国際共著
[学会発表] GPT-RE: In-context Learning for Relation Extraction using Large Language Models2023
- 著者名/発表者名
  Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li and Sadao Kurohashi
- 学会等名
  Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
- 関連する報告書
  2023 実績報告書
- 国際学会
[学会発表] Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation2023
- 著者名/発表者名
  Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu and Sadao Kurohashi
- 学会等名
  Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
- 関連する報告書
  2023 実績報告書
- 国際学会
[学会発表] Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation.2023
- 著者名/発表者名
  Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu and Sadao Kurohashi
- 学会等名
  Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation)
- 関連する報告書
  2023 実績報告書
- 国際学会
[学会発表] LEALLA: Learning Lightweight Language-agnostic Sentence Embedding with Knowledge Distillation2023
- 著者名/発表者名
  Zhuoyuan Mao and Tetsuji Nakagawa
- 学会等名
  Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)
- 関連する報告書
  2023 実績報告書
- 国際学会
[学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision2023
- 著者名/発表者名
  Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi
- 学会等名
  Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023): Findings Volume
- 関連する報告書
  2023 実績報告書
- 国際学会
[学会発表] When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?2022
- 著者名/発表者名
  Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan and Sadao Kurohashi
- 学会等名
  Findings of the Association for Computational Linguistics: NAACL 2022
- 関連する報告書
  2022 実績報告書
- 国際学会
[学会発表] Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems2022
- 著者名/発表者名
  Yibin Shen, Qianying Liu, Zhuoyuan Mao, Zhen Wan, Fei Cheng and Sadao Kurohashi
- 学会等名
  Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing
- 関連する報告書
  2022 実績報告書
- 国際学会
[学会発表] BERTSeg: BERT Based Unsupervised Subword Segmentation for Neural Machine Translation2022
- 著者名/発表者名
  Haiyue Song, Raj Dabre, Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi
- 学会等名
  Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing
- 関連する報告書
  2022 実績報告書
- 国際学会
[学会発表] Textual Enhanced Contrastive Learning for Solving Math Word Problems2022
- 著者名/発表者名
  Yibin Shen, Qianying Liu, Zhuoyuan Mao, Fei Cheng and Sadao Kurohashi
- 学会等名
  Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- 関連する報告書
  2022 実績報告書
- 国際学会
[学会発表] Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation Extraction2022
- 著者名/発表者名
  Zhen Wan, Qianying Liu, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi and Jiwei Li
- 学会等名
  Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- 関連する報告書
  2022 実績報告書
- 国際学会
[学会発表] LEALLA: Learning Lightweight Language-agnostic Sentence Embedding with Knowledge Distillation2022
- 著者名/発表者名
  Zhuoyuan Mao and Tetsuji Nakagawa
- 学会等名
  Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
- 関連する報告書
  2022 実績報告書
- 国際学会
[学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision2022
- 著者名/発表者名
  Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi
- 学会等名
  Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
- 関連する報告書
  2022 実績報告書
- 国際学会
[学会発表] Efficiently Learning Multilingual Sentence Representation for Cross-lingual Sentence Classification2022
- 著者名/発表者名
  Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi
- 学会等名
  言語処理学会第29回年次大会
- 関連する報告書
  2022 実績報告書
[学会・シンポジウム開催] Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)2023
- 関連する報告書
  2023 実績報告書
[学会・シンポジウム開催] The 24th Annual Conference of The European Association for Machine Translation2023
- 関連する報告書
  2023 実績報告書
[学会・シンポジウム開催] Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing2022
- 関連する報告書
  2022 実績報告書

事前学習と多言語意味表現学習を統合した低資源機械翻訳

研究代表者

毛 卓遠 京都大学, 情報学研究科, 特別研究員(DC2)

1,700千円 (直接経費: 1,700千円)

報告書

研究成果

[雑誌論文] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024

著者名/発表者名

雑誌名

DOI

ISSN

関連する報告書

[雑誌論文] Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation2022

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] SCTB-V2: the 2nd Version of the Chinese Treebank in the Scientific Domain2022

著者名/発表者名

雑誌名

DOI

関連する報告書

[学会発表] GPT-RE: In-context Learning for Relation Extraction using Large Language Models2023

著者名/発表者名

学会等名

関連する報告書

[学会発表] Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation2023

著者名/発表者名

学会等名

関連する報告書

[学会発表] Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation.2023

著者名/発表者名

学会等名

関連する報告書

[学会発表] LEALLA: Learning Lightweight Language-agnostic Sentence Embedding with Knowledge Distillation2023

著者名/発表者名

学会等名

関連する報告書

[学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision2023

著者名/発表者名

学会等名

関連する報告書

[学会発表] When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?2022

著者名/発表者名

学会等名

関連する報告書

[学会発表] Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems2022

著者名/発表者名

学会等名

関連する報告書

[学会発表] BERTSeg: BERT Based Unsupervised Subword Segmentation for Neural Machine Translation2022

著者名/発表者名

学会等名

関連する報告書

[学会発表] Textual Enhanced Contrastive Learning for Solving Math Word Problems2022

著者名/発表者名

学会等名

関連する報告書

[学会発表] Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation Extraction2022

著者名/発表者名

学会等名

関連する報告書

[学会発表] LEALLA: Learning Lightweight Language-agnostic Sentence Embedding with Knowledge Distillation2022

著者名/発表者名

学会等名

関連する報告書

[学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision2022

著者名/発表者名

学会等名

関連する報告書

[学会発表] Efficiently Learning Multilingual Sentence Representation for Cross-lingual Sentence Classification2022

著者名/発表者名

学会等名

関連する報告書

[学会・シンポジウム開催] Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)2023

関連する報告書

[学会・シンポジウム開催] The 24th Annual Conference of The European Association for Machine Translation2023

関連する報告書

[学会・シンポジウム開催] Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing2022

関連する報告書

毛卓遠京都大学, 情報学研究科, 特別研究員(DC2)