2018 Fiscal Year Research-status Report

Self-explainable and fast-to-train example-based machine translation using neural networks

Research Project

Project/Area Number	18K11447
Research Institution	Waseda University
Principal Investigator	LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
Project Period (FY)	2018-04-01 – 2021-03-31
Keywords	natural language / machine translation / case-based reasoning / analogy / explainable AI
Outline of Annual Research Achievements	Two example-based machine translation (EBMT) systems were implemented: The first EBMT system follows the direct approach. Numerical approaches were introduced in adaptation and retrieval (1 paper at international conference). The second EBMT system follows the indirect approach (1 paper at international conference). Several versions of the systems were implemented. A better formalisation has been proposed (1 paper submitted to an international conference). Data was collected: The BTEC corpus is too expensive, but contacts with ATR were established for further enquiry. In experiments, data from the Tatoeba corpus were used instead. Very large word embeddings (continuous vector representations of words) in several languages were downloaded and cleaned up. Experiments were conducted: For comparison, an SMT system and an NMT system were built. All the systems were tested on the same data to assess the respective translation accuracy. A GPU machine, DeepLearning Box, was acquired to run experiments. Work on analogy: The use of neural networks has been proposed for formal analogies. Experiments were conducted with the first EBMT system (1 paper at international conference). An algorithm has been proposed for semantico-formal analogies between sentences using vector representations of words. Produced data have been publicly released (1 paper at international conference in next fiscal year). Formal transformations which keep analogies have been studied (1 paper at international conference, best paper award).
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason The planning is basically kept. In the 1st year, as planned, systems were built, data collected and experiments performed for comparison in translation accuracy. In addition, theoretical work on the formalisation of the EBMT system as a case-based reasoning system and on the formalisation of analogy has been conducted. As for data, the acquisition of the BTEC being too expensive, its acquisition has been cancelled. Work delayed: although initially planned for the 1st fiscal year, work on segmentation of parallel corpora with soft sub-sentential alignment is delayed and will be done during the 2nd fiscal year. Work done in advance of schedule: (1) although initially planned for the 2nd fiscal year, work on solving analogies between soft alignments of sequences of continuous representations of words, has begun. Work will continue on this during the 2nd fiscal year. The project is also in advance with work on resolution of semantico-formal analogies between sentences which was scheduled for the 2nd year. Production of a set of analogies between sentences extracted from Tatoeba is released on our global server. (2) Although initially planned for the 3rd fiscal year, work on self-explanation of translation has already begun. It consists in tracing the execution of the translation process to explain the choices made by the system when translating. Console traces and interfaces have been implemented. Work is still needed to make them shorter, more user-friendly and more understandable.
Strategy for Future Research Activity	As announced in the planning of the project, the main theoretical topic of research for the 2nd fiscal year will be the resolution of analogies between soft representations of sentences using neural networks. This will be integrated in the EBMT systems. Experiments will be performed with bilingual analogies, for the direct approach to EBMT and with monolingual analogies for the indirect approach to EBMT. Monolingual and bilingual alignments and analogies will be used in the final translation system. This requires to merge the direct and the indirect approaches. A formalisation of EBMT seen as local enrichment of a translation memory by application of linguistic variations has been proposed. Linguistic variations in the source language and in the target language are extracted from analogical clusters. The selection of the most efficient variations is still an open problem. The use of penalties to assess the quality of newly created sentence pairs resulting from the enrichment of the case base remains to be evaluated. To merge the direct and the indirect approaches to EBMT, how to associate variations in the source language and variations in the target language (i.e., analogical clusters) will be studied. Work on self-explanation will continue. Interfaces for the visualisation of the application of variations and visualisation of the enrichment of the translation memory will be improved. The existing explanations need to be shorter and more easily understandable by a standard user.
Causes of Carryover	Budget will be spent for participation in an international conference in Europe.

Research Products
(5 results)

All 2018 Other

All Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results, Open Access: 1 results) Presentation (3 results) (of which Int'l Joint Research: 2 results) Remarks (1 results)

[Journal Article] Case-Based Translation: First Steps from a Knowledge-Light Approach Based on Analogy to a Knowledge-Intensive One2018
- Author(s)
  Y. Lepage and J. Lieber
- Journal Title
  
  ICCBR 2018, LNAI
  
  Volume: 11156 Pages: 273-288
- DOI
  https://doi.org/10.1007/978-3-030-01081-2_37
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Numerical methods for retrieval and adaptation in Nagao’s EBMT model2018
- Author(s)
  K. He, T. Zhao, and Y. Lepage
- Organizer
  2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS 2018)
- Int'l Joint Research
[Presentation] String transformations preserving analogies2018
- Author(s)
  Y. Lepage
- Organizer
  2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS 2018)
- Int'l Joint Research
[Presentation] Context encoder for analogies on strings2018
- Author(s)
  T. Zhao and Y. Lepage
- Organizer
  32th Pacific Asia Conference on Language, Information and Computation (PACLIC 32)
[Remarks] Kakenhi Kiban C 18K11447
- URL
  http://lepage-lab.ips.waseda.ac.jp/en/projects/kakenhi-kiban-c-18k11447/

2018 Fiscal Year Research-status Report

Self-explainable and fast-to-train example-based machine translation using neural networks

Principal Investigator

LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Case-Based Translation: First Steps from a Knowledge-Light Approach Based on Analogy to a Knowledge-Intensive One2018

Author(s)

Journal Title

DOI

[Presentation] Numerical methods for retrieval and adaptation in Nagao’s EBMT model2018

Author(s)

Organizer

[Presentation] String transformations preserving analogies2018

Author(s)

Organizer

[Presentation] Context encoder for analogies on strings2018

Author(s)

Organizer

[Remarks] Kakenhi Kiban C 18K11447

URL