Multiple resource adaptation for low resource neural machine translation
Project/Area Number |
17H06822
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Single-year Grants |
Research Field |
Intelligent informatics
|
Research Institution | Osaka University |
Principal Investigator |
CHU CHENHUI 大阪大学, データビリティフロンティア機構, 特任助教(常勤) (70784891)
|
Research Collaborator |
Dabre Raj
|
Project Period (FY) |
2017-08-25 – 2019-03-31
|
Project Status |
Completed (Fiscal Year 2018)
|
Budget Amount *help |
¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)
Fiscal Year 2018: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | ニューラル機械翻訳 / 分野適応 / 低資源 / 機械翻訳 / ローリソース / ドメイン適応 / マルチリソース適応 |
Outline of Final Research Achievements |
In Japan, because of the rapid increase of foreign tourists and the host of the 2020 Tokyo Olympic Games, translation needs are rapidly growing, making machine translation (MT) indispensable. In MT, the translation knowledge is acquired from parallel corpora (sentence-aligned bilingual texts). However, as parallel corpora between Japanese and most languages (e.g., Japanese-Indonesian) and domains (e.g., medical domain) are very scarce (only tens of thousands of parallel sentences or fewer), the translation quality is not satisfied. Improving MT quality in this low-resource scenario is a challenging unsolved problem. Our core idea is adapting knowledge from multiple resources, including parallel corpora of resource rich-languages (such as French-English) and domains (such as the parliamentary domain), and large-scale monolingual web corpora to improve low-resource NMT. Experiments show that we significantly improved low-resource MT with multi-resource adaptation.
|
Academic Significance and Societal Importance of the Research Achievements |
深層学習に基づくニューラル機械翻訳(NMT)の発展により、大規模な対訳コーパスが入手できる場合に最先端の翻訳精度を達成したが、対訳コーパスが少量な場合に翻訳精度が低いことが知られている。しかし、特定言語対や分野の対訳コーパスが大量に存在しない場面が数々ある。例えば、2020年東京オリンピックでは、日本語から東南アジア言語へのスポーツ分野でのMTサービスが非常に重要だと思われるが、それらの言語対や分野において対訳コーパスは少量かほとんど存在しない。本研究で提案したマルチリソース適用はそのような低資源のNMTの翻訳精度向上に成功し、MTの実用化をさらに前進させた。
|
Report
(3 results)
Research Products
(16 results)