Budget Amount *help |
¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Fiscal Year 2017: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
|
Outline of Final Research Achievements |
The goal of the project was 1/ to build tools to produce analogical clusters from monolingual data, 2/ to use such clusters in the production of quasi-parallel corpora, 3/ to use such quasi-parallel corpora in addition to parallel corpora 4/ to obtain improvements in translation accuracy in statistical machine translation (SMT). Tools were built and publicly released. In addition to what was announced in the research plan, a new data structure, analogical grid was introduced. Data were produced in morphologically poor to rich languages: 11 European languages (N-grams from word to 6-grams), Chinese, Japanese (short sentences of less than 30 characters for SMT experiments), and additional languages (word forms in Arabic, Georgian, Navajo, Russian, Turkish, etc.). Part of the data has been publicly released. Various experiments showed that it is possible to improve translation accuracy thanks to quasi-parallel data produced by analogy, and filtered, in SMT for Chinese-Japanese.
|