• 研究課題をさがす
  • 研究者をさがす
  • KAKENの使い方
  1. 課題ページに戻る

2023 年度 実績報告書

多言語コーパス構築とドメイン適応による低資源機械翻訳

研究課題

研究課題/領域番号 22KJ1724
配分区分基金
研究機関国立研究開発法人情報通信研究機構

研究代表者

宋 海越  国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所 先進的音声翻訳研究開発推進センター 先進的翻訳技術研究室, 研究技術員

研究期間 (年度) 2023-03-08 – 2024-03-31
キーワードmachine translation / low-resource languages / subword segmentation / subword encoding / decoding algorithm / corpora creation
研究実績の概要

Our research focused on enhancing machine translation for low-resource scenarios such as translation between Asian languages and English, and translation in specific domains such as the educational domain. To achieve this, we propose to 1) create bilingual corpora, mainly in the first year, for the low-resource domain and 2) optimize the subword segmentation information during the encoding phase in the second year and the decoding phase in the last year.
As for the publications, during the last year, there were 3 first-authored journal papers and 1 conference paper published or submitted. Over the past three years, there have been 4 journal papers and 9 international conference papers, including co-authored papers. Additionally, one patent application is underway.
This research has significantly improved the translation quality for low-resource scenarios. Through experiments, we found that the quality score measured by BLEU is improved by more than 3 points.
The low-resource translation system is indispensable for cross-cultural communication in international events such as EXPO 2025. With our approach, we can make the translation system more practical for participants who speak low-resource languages.

  • 研究成果

    (15件)

すべて 2024 2023 その他

すべて 国際共同研究 (1件) 雑誌論文 (2件) (うち査読あり 2件、 オープンアクセス 2件) 学会発表 (9件) (うち国際学会 8件) 備考 (3件)

  • [国際共同研究] University of Cape Town(南アフリカ)

    • 国名
      南アフリカ
    • 外国機関名
      University of Cape Town
  • [雑誌論文] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024

    • 著者名/発表者名
      Song Haiyue、Mao Zhuoyuan、Dabre Raj、Chu Chenhui、Kurohashi Sadao
    • 雑誌名

      Journal of Natural Language Processing

      巻: 31 ページ: 155~188

    • DOI

      10.5715/jnlp.31.155

    • 査読あり / オープンアクセス
  • [雑誌論文] SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation2023

    • 著者名/発表者名
      Song Haiyue、Dabre Raj、Chu Chenhui、Kurohashi Sadao、Sumita Eiichiro
    • 雑誌名

      ACM Transactions on Asian and Low-Resource Language Information Processing

      巻: 22 ページ: 1~24

    • DOI

      10.1145/3610611

    • 査読あり / オープンアクセス
  • [学会発表] SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation.2024

    • 著者名/発表者名
      Haiyue Song, Francois Meyer, Raj Dabre, Hideki Tanaka, Chenhui Chu, and Sadao Kurohashi.
    • 学会等名
      The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024)
    • 国際学会
  • [学会発表] Linguistically Motivated Neural Machine Translation.2024

    • 著者名/発表者名
      Haiyue Song, Hour Kaing, and Raj Dabre.
    • 学会等名
      The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024)
    • 国際学会
  • [学会発表] NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages.2024

    • 著者名/発表者名
      Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka.
    • 学会等名
      The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
    • 国際学会
  • [学会発表] Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks.2024

    • 著者名/発表者名
      Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara.
    • 学会等名
      The 14th International Workshop on Spoken Dialogue Systems Technology (IWSDS2024)
    • 国際学会
  • [学会発表] Robust Neural Machine Translation for Abugidas by Glyph Perturbation2024

    • 著者名/発表者名
      Hour Kaing, Chenchen Ding, Haiyue Song, Jiannan Mao, Hideki Tanaka, and Masao Utiyama.
    • 学会等名
      言語処理学会 第30回年次大会
  • [学会発表] GPT-RE: In-context Learning for Relation Extraction using Large Language Models.2023

    • 著者名/発表者名
      Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi.
    • 学会等名
      The 2023 Conference on Empirical Methods in Natural Language Processing
    • 国際学会
  • [学会発表] Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation.2023

    • 著者名/発表者名
      Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi.
    • 学会等名
      The 61st Annual Meeting of the Association for Computational Linguistics
    • 国際学会
  • [学会発表] Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation.2023

    • 著者名/発表者名
      Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi.
    • 学会等名
      Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation) held in conjection with EAMT2023.
    • 国際学会
  • [学会発表] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision.2023

    • 著者名/発表者名
      Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi.
    • 学会等名
      The 17th Conference of the European Chapter of the Association for Computational Linguistics
    • 国際学会
  • [備考] Haiyue Song's Homepage

    • URL

      https://shyyhs.github.io/

  • [備考] 言語メディア研究室 研究発表一覧

    • URL

      https://nlp.ist.i.kyoto-u.ac.jp/?%E7%A0%94%E7%A9%B6%E7%99%BA%E8%A1%A8%E4%B8%80%E8%A6%A7

  • [備考] 先進的翻訳技術研究室 論文

    • URL

      https://att-astrec.nict.go.jp/result/

URL: 

公開日: 2024-12-25  

サービス概要 検索マニュアル よくある質問 お知らせ 利用規程 科研費による研究の帰属

Powered by NII kakenhi