• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Semi-supervised Machine Translation based on Quality Estimation

Research Project

Project/Area Number 20K19861
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61030:Intelligent informatics-related
Research InstitutionEhime University

Principal Investigator

Kajiwara Tomoyuki  愛媛大学, 理工学研究科(工学系), 講師 (70824960)

Project Period (FY) 2020-04-01 – 2024-03-31
Project Status Completed (Fiscal Year 2023)
Budget Amount *help
¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)
Fiscal Year 2023: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2022: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2021: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2020: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords機械翻訳 / 品質推定 / 強化学習 / 自然言語処理 / 知能情報学
Outline of Research at the Start

東京五輪や大阪万博に向けて、機械翻訳の精度向上が急務である。従来の機械翻訳の訓練では、モデルが出力した翻訳文を単語単位で表層的に正解文と比較して品質を評価し、モデルにフィードバックする。しかし、このような方法では、正解文と表層的に一致しない良い翻訳文に対して不当に低い評価を与える場合がある。本研究では、文単位で意味的に入力文と比較して翻訳文を評価し、翻訳器を訓練する。この方法では、対訳データなしで翻訳器を訓練できるため、少資源や教師なしの設定における中品質な機械翻訳を改善でき、各種サービスの多言語展開に貢献すると期待できる。

Outline of Final Research Achievements

Improving machine translation performance is an urgent task for the Osaka Expo. In this research, we worked on quality estimation for automatic evaluation of output sentences by natural language processing models without reference sentences, and also trained machine translation and other text generation models by reinforcement learning with such quality estimation as a reward. For the former, we proposed a series of methods for unsupervised quality estimation based on multilingual sentence encoders and achieved better performance than existing methods in terms of correlation with human evaluation. For the latter, we improved the quality of the generated text by reinforcement learning, which employs quality estimation as a reward function for machine translation and text simplification.

Academic Significance and Societal Importance of the Research Achievements

大阪万博に向けて、機械翻訳の精度向上が急務である。従来の機械翻訳の訓練では、出力文と正解文の単語一致率を最大化するように深層学習モデルを最適化するのが一般的であり、正解文と表層的に異なる出力文は意味的に正しくともペナルティを受けてしまう。単語単位で表層的な評価に基づくフィードバックを行う従来手法に対して、本研究では、文単位で意味的な評価に基づくフィードバックを用いて機械翻訳を訓練する。正解文の表現に対する依存を減らして柔軟な訓練を実現する本手法は、機械翻訳をはじめとする様々なテキスト生成技術の性能を改善する可能性を持つ。本研究では、機械翻訳とテキスト平易化において、その有効性を検証した。

Report

(5 results)
  • 2023 Annual Research Report   Final Research Report ( PDF )
  • 2022 Research-status Report
  • 2021 Research-status Report
  • 2020 Research-status Report
  • Research Products

    (9 results)

All 2023 2022 2021 2020

All Journal Article (2 results) (of which Peer Reviewed: 2 results,  Open Access: 2 results) Presentation (7 results) (of which Int'l Joint Research: 6 results)

  • [Journal Article] Unsupervised Quality Estimation via Multilingual Denoising Autoencoder2022

    • Author(s)
      西原哲郎, 岩本裕司, 吉仲真人, 梶原智之, 荒瀬由紀, 二宮崇
    • Journal Title

      Journal of Natural Language Processing

      Volume: 29 Issue: 2 Pages: 669-687

    • DOI

      10.5715/jnlp.29.669

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Optimization of Reference-less Evaluation Metric of Grammatical Error Correction for Manual Evaluations2021

    • Author(s)
      吉村綾馬, 金子正弘, 梶原智之, 小町守
    • Journal Title

      Journal of Natural Language Processing

      Volume: 28 Issue: 2 Pages: 404-427

    • DOI

      10.5715/jnlp.28.404

    • NAID

      130008052579

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] Unsupervised Translation Quality Estimation Exploiting Synthetic Data and Pre-trained Multilingual Encoder2023

    • Author(s)
      Yuto Kuroda, Atsushi Fujita, Tomoyuki Kajiwara, Takashi Ninomiya
    • Organizer
      arXiv:2311.05117
    • Related Report
      2023 Annual Research Report
  • [Presentation] Adversarial Training on Disentangling Meaning and Language Representations for Unsupervised Quality Estimation2022

    • Author(s)
      Yuto Kuroda, Tomoyuki Kajiwara, Yuki Arase, Takashi Ninomiya
    • Organizer
      Proceedings of the 29th International Conference on Computational Linguistics
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] Comparing BERT-based Reward Functions for Deep Reinforcement Learning in Machine Translation2022

    • Author(s)
      Yuki Nakatani, Tomoyuki Kajiwara, Takashi Ninomiya
    • Organizer
      Proceedings of the 9th Workshop on Asian Translation
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] Language-agnostic Representation from Multilingual Sentence Encoders for Cross-lingual Similarity Estimation2021

    • Author(s)
      Nattapong Tiyajamorn, Tomoyuki Kajiwara, Yuki Arase, Makoto Onizuka
    • Organizer
      Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Text Simplification with Reinforcement Learning using Supervised Rewards on Grammaticality, Meaning Preservation, and Simplicity2020

    • Author(s)
      Akifumi Nakamachi, Tomoyuki Kajiwara, Yuki Arase
    • Organizer
      Proceedings of the AACL-IJCNLP 2020 Student Research Workshop
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction2020

    • Author(s)
      Ryoma Yoshimura, Masahiro Kaneko, Tomoyuki Kajiwara, Mamoru Komachi
    • Organizer
      Proceedings of the 28th International Conference on Computational Linguistics
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] TMUOU Submission for WMT20 Quality Estimation Shared Task2020

    • Author(s)
      Akifumi Nakamachi, Hiroki Shimanaka, Tomoyuki Kajiwara, Mamoru Komachi
    • Organizer
      Proceedings of the Fifth Conference on Machine Translation
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research

URL: 

Published: 2020-04-28   Modified: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi