研究実績の概要 |
Year 3 was dedicated to further experiments in generating new words in analogical grids, an additional promising data structure. The notion of saturation of a grid is important to better identify new words or phrases which can be created by analogy. Experiments on the quality of newly created word forms were conducted in a morphologically rich language. Experiments announced in the research plan to predict the quality of newly created word forms in several languages, English, Finnish, German and Indonesian, showed that Fisher's test can help to control the quality of newly generated word forms. Tools and data produced during the research will be officially advertised during the LREC 2018 conference: fast tools for the computation of analogical clusters and grids and analogical clusters and grids for N-grams in 11 European languages (they were already available on the web site of the project since year 2). In addition, a large data set of 65 million formal analogies between word forms in 10 languages will be released. Dissemination of results. The principal investigator will deliver an invited talk in a LREC workshop to present the results of the research: 1/ fast production of analogical clusters from monolingual data, 2/ use in production of quasi-parallel data, 3/ use of such data in statistical machine translation and 4/ synthesis on improvements in translation accuracy. The principal investigator also made two invited presentations, one in a meeting on machine translation in France and in an international conference on natural language processing in Morocco.
|