• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2023 Fiscal Year Annual Research Report

多言語コーパス構築とドメイン適応による低資源機械翻訳

Research Project

Project/Area Number 22KJ1724
Allocation TypeMulti-year Fund
Research InstitutionNational Institute of Information and Communications Technology

Principal Investigator

宋 海越  国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所 先進的音声翻訳研究開発推進センター 先進的翻訳技術研究室, 研究技術員

Project Period (FY) 2023-03-08 – 2024-03-31
Keywordsmachine translation / low-resource languages / subword segmentation / subword encoding / decoding algorithm / corpora creation
Outline of Annual Research Achievements

Our research focused on enhancing machine translation for low-resource scenarios such as translation between Asian languages and English, and translation in specific domains such as the educational domain. To achieve this, we propose to 1) create bilingual corpora, mainly in the first year, for the low-resource domain and 2) optimize the subword segmentation information during the encoding phase in the second year and the decoding phase in the last year.
As for the publications, during the last year, there were 3 first-authored journal papers and 1 conference paper published or submitted. Over the past three years, there have been 4 journal papers and 9 international conference papers, including co-authored papers. Additionally, one patent application is underway.
This research has significantly improved the translation quality for low-resource scenarios. Through experiments, we found that the quality score measured by BLEU is improved by more than 3 points.
The low-resource translation system is indispensable for cross-cultural communication in international events such as EXPO 2025. With our approach, we can make the translation system more practical for participants who speak low-resource languages.

  • Research Products

    (15 results)

All 2024 2023 Other

All Int'l Joint Research (1 results) Journal Article (2 results) (of which Peer Reviewed: 2 results,  Open Access: 2 results) Presentation (9 results) (of which Int'l Joint Research: 8 results) Remarks (3 results)

  • [Int'l Joint Research] University of Cape Town(南アフリカ)

    • Country Name
      SOUTH AFRICA
    • Counterpart Institution
      University of Cape Town
  • [Journal Article] DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation2024

    • Author(s)
      Song Haiyue、Mao Zhuoyuan、Dabre Raj、Chu Chenhui、Kurohashi Sadao
    • Journal Title

      Journal of Natural Language Processing

      Volume: 31 Pages: 155~188

    • DOI

      10.5715/jnlp.31.155

    • Peer Reviewed / Open Access
  • [Journal Article] SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation2023

    • Author(s)
      Song Haiyue、Dabre Raj、Chu Chenhui、Kurohashi Sadao、Sumita Eiichiro
    • Journal Title

      ACM Transactions on Asian and Low-Resource Language Information Processing

      Volume: 22 Pages: 1~24

    • DOI

      10.1145/3610611

    • Peer Reviewed / Open Access
  • [Presentation] SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation.2024

    • Author(s)
      Haiyue Song, Francois Meyer, Raj Dabre, Hideki Tanaka, Chenhui Chu, and Sadao Kurohashi.
    • Organizer
      The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024)
    • Int'l Joint Research
  • [Presentation] Linguistically Motivated Neural Machine Translation.2024

    • Author(s)
      Haiyue Song, Hour Kaing, and Raj Dabre.
    • Organizer
      The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024)
    • Int'l Joint Research
  • [Presentation] NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages.2024

    • Author(s)
      Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka.
    • Organizer
      The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
    • Int'l Joint Research
  • [Presentation] Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks.2024

    • Author(s)
      Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara.
    • Organizer
      The 14th International Workshop on Spoken Dialogue Systems Technology (IWSDS2024)
    • Int'l Joint Research
  • [Presentation] Robust Neural Machine Translation for Abugidas by Glyph Perturbation2024

    • Author(s)
      Hour Kaing, Chenchen Ding, Haiyue Song, Jiannan Mao, Hideki Tanaka, and Masao Utiyama.
    • Organizer
      言語処理学会 第30回年次大会
  • [Presentation] GPT-RE: In-context Learning for Relation Extraction using Large Language Models.2023

    • Author(s)
      Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi.
    • Organizer
      The 2023 Conference on Empirical Methods in Natural Language Processing
    • Int'l Joint Research
  • [Presentation] Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation.2023

    • Author(s)
      Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi.
    • Organizer
      The 61st Annual Meeting of the Association for Computational Linguistics
    • Int'l Joint Research
  • [Presentation] Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation.2023

    • Author(s)
      Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi.
    • Organizer
      Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation) held in conjection with EAMT2023.
    • Int'l Joint Research
  • [Presentation] Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision.2023

    • Author(s)
      Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi.
    • Organizer
      The 17th Conference of the European Chapter of the Association for Computational Linguistics
    • Int'l Joint Research
  • [Remarks] Haiyue Song's Homepage

    • URL

      https://shyyhs.github.io/

  • [Remarks] 言語メディア研究室 研究発表一覧

    • URL

      https://nlp.ist.i.kyoto-u.ac.jp/?%E7%A0%94%E7%A9%B6%E7%99%BA%E8%A1%A8%E4%B8%80%E8%A6%A7

  • [Remarks] 先進的翻訳技術研究室 論文

    • URL

      https://att-astrec.nict.go.jp/result/

URL: 

Published: 2024-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi