2023 Fiscal Year Final Research Report

Semi-supervised Machine Translation based on Quality Estimation

Research Project

PDF

Project/Area Number	20K19861
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Ehime University
Principal Investigator	Kajiwara Tomoyuki 愛媛大学, 理工学研究科(工学系), 講師 (70824960)
Project Period (FY)	2020-04-01 – 2024-03-31
Keywords	機械翻訳 / 品質推定 / 強化学習
Outline of Final Research Achievements	Improving machine translation performance is an urgent task for the Osaka Expo. In this research, we worked on quality estimation for automatic evaluation of output sentences by natural language processing models without reference sentences, and also trained machine translation and other text generation models by reinforcement learning with such quality estimation as a reward. For the former, we proposed a series of methods for unsupervised quality estimation based on multilingual sentence encoders and achieved better performance than existing methods in terms of correlation with human evaluation. For the latter, we improved the quality of the generated text by reinforcement learning, which employs quality estimation as a reward function for machine translation and text simplification.
Free Research Field	自然言語処理
Academic Significance and Societal Importance of the Research Achievements	大阪万博に向けて、機械翻訳の精度向上が急務である。従来の機械翻訳の訓練では、出力文と正解文の単語一致率を最大化するように深層学習モデルを最適化するのが一般的であり、正解文と表層的に異なる出力文は意味的に正しくともペナルティを受けてしまう。単語単位で表層的な評価に基づくフィードバックを行う従来手法に対して、本研究では、文単位で意味的な評価に基づくフィードバックを用いて機械翻訳を訓練する。正解文の表現に対する依存を減らして柔軟な訓練を実現する本手法は、機械翻訳をはじめとする様々なテキスト生成技術の性能を改善する可能性を持つ。本研究では、機械翻訳とテキスト平易化において、その有効性を検証した。