2020 Fiscal Year Annual Research Report
Self-explainable and fast-to-train example-based machine translation using neural networks
Project/Area Number |
18K11447
|
Research Institution | Waseda University |
Principal Investigator |
LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | 自然言語処理 / 用例手法 / ニューラルと統計手法 |
Outline of Annual Research Achievements |
Work on the use of vector representations was conducted to solve analogies between sentences (1) mono-lingually and (2) bilingually. (1) A neural model using vector representations was designed and tested. A monolingual dataset of analogies between short sentences was produced and publicly released, based on the data set released in the second year. A paper published in an international conference won the conference best paper award (2) Alignment between sentences in two languages as in statistical machine translation was extended (2.1) from crispy to soft alignment, using word embeddings and (2.2) from monolingual alignment, as for semantico-formal analogies, to bilingual alignment, using translation probabilities. The use of sub-sentential alignment or bilingual word embedding mapping was compared in an example-based machine translation experiment. A paper published in the French NLP conference won the conference best paper award. Experiments on analogical density are conducted to determine which segmentation units lead to more dense vs. less dense corpora measured by number of analogies. A research assistant was hired. Data with longer or shorter sentences are used. Preliminary results show that keeping most frequent sub-words is the best choice to increase analogical density. A paper will be submitted to a journal. Work was conducted on retrieval of sentences which cover a given sentence formally and semantically. The combination of vector representations of sentences with exact matching has been explored. A paper will be submitted to an international conference.
|
Remarks |
Data produced along with the article in ICACSIS are released. See tab: Experimental results, 2nd resource: a set of more than 5,000 analogies...
|
Research Products
(3 results)