研究実績の概要 |
In Japan, because of the rapid increase of foreign tourists and the host of the 2020 Tokyo Olympic Games, translation needs are rapidly growing, making machine translation (MT) indispensable. In MT, the translation knowledge is acquired from parallel corpora (sentence-aligned bilingual texts). However, as parallel corpora between Japanese and most languages (e.g., Japanese-Indonesian) and domains (e.g., medical domain) are very scarce (only tens of thousands of parallel sentences or fewer), the translation quality is not satisfied. Improving MT quality in this low resource scenario is a challenging unsolved problem. The purpose of this research is improving MT quality in this low resource scenario using multiple resources, including parallel corpora of resource rich languages (such as French-English) and domains (such as the parliamentary domain), and large-scale monolingual web corpora. In FY2017, we established model adaptation technologies using resource rich language and domain parallel corpora. Specifically, we obtained the following achievements: 1. Single language/domain adaptation. We developed novel methods and conducted a comprehensive empirical comparison of previous studies. Our research achievements have been published at ACL 2017 (the top conference in natural language processing) and accepted to be published in the Journal of Information Processing in June. 2. Multiple language/domain adaptation. We also developed methods for domain adaptation using multilingual and multi-domain corpora, and presented our work at NLP 2018.
|
今後の研究の推進方策 |
We will study the remaining two topics: data adaptation using large-scale monolingual web corpora and multiple resource adapted system integration as scheduled. In our journal paper, which will be published in the Journal of Information Processing in June, we actually have conducted a comparison of previous studies in these two topics. In addition, we wrote a survey paper of domain adaptation for neural machine translation and submitted it to COLING 2018 (a top conference in natural language processing). We believe that these preliminary studies will make our research in FY2018 smooth.
|