• 研究課題をさがす
  • 研究者をさがす
  • KAKENの使い方
  1. 課題ページに戻る

2018 年度 実施状況報告書

Self-explainable and fast-to-train example-based machine translation using neural networks

研究課題

研究課題/領域番号 18K11447
研究機関早稲田大学

研究代表者

LEPAGE YVES  早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)

研究期間 (年度) 2018-04-01 – 2021-03-31
キーワードnatural language / machine translation / case-based reasoning / analogy / explainable AI
研究実績の概要

Two example-based machine translation (EBMT) systems were implemented: The first EBMT system follows the direct approach. Numerical approaches were introduced in adaptation and retrieval (1 paper at international conference). The second EBMT system follows the indirect approach (1 paper at international conference). Several versions of the systems were implemented. A better formalisation has been proposed (1 paper submitted to an international conference).
Data was collected: The BTEC corpus is too expensive, but contacts with ATR were established for further enquiry. In experiments, data from the Tatoeba corpus were used instead. Very large word embeddings (continuous vector representations of words) in several languages were downloaded and cleaned up.
Experiments were conducted: For comparison, an SMT system and an NMT system were built. All the systems were tested on the same data to assess the respective translation accuracy. A GPU machine, DeepLearning Box, was acquired to run experiments.
Work on analogy: The use of neural networks has been proposed for formal analogies. Experiments were conducted with the first EBMT system (1 paper at international conference). An algorithm has been proposed for semantico-formal analogies between sentences using vector representations of words. Produced data have been publicly released (1 paper at international conference in next fiscal year). Formal transformations which keep analogies have been studied (1 paper at international conference, best paper award).

現在までの達成度 (区分)
現在までの達成度 (区分)

2: おおむね順調に進展している

理由

The planning is basically kept.
In the 1st year, as planned, systems were built, data collected and experiments performed for comparison in translation accuracy. In addition, theoretical work on the formalisation of the EBMT system as a case-based reasoning system and on the formalisation of analogy has been conducted.
As for data, the acquisition of the BTEC being too expensive, its acquisition has been cancelled.
Work delayed: although initially planned for the 1st fiscal year, work on segmentation of parallel corpora with soft sub-sentential alignment is delayed and will be done during the 2nd fiscal year.
Work done in advance of schedule: (1) although initially planned for the 2nd fiscal year, work on solving analogies between soft alignments of sequences of continuous representations of words, has begun. Work will continue on this during the 2nd fiscal year. The project is also in advance with work on resolution of semantico-formal analogies between sentences which was scheduled for the 2nd year. Production of a set of analogies between sentences extracted from Tatoeba is released on our global server. (2) Although initially planned for the 3rd fiscal year, work on self-explanation of translation has already begun. It consists in tracing the execution of the translation process to explain the choices made by the system when translating. Console traces and interfaces have been implemented. Work is still needed to make them shorter, more user-friendly and more understandable.

今後の研究の推進方策

As announced in the planning of the project, the main theoretical topic of research for the 2nd fiscal year will be the resolution of analogies between soft representations of sentences using neural networks. This will be integrated in the EBMT systems. Experiments will be performed with bilingual analogies, for the direct approach to EBMT and with monolingual analogies for the indirect approach to EBMT. Monolingual and bilingual alignments and analogies will be used in the final translation system. This requires to merge the direct and the indirect approaches.
A formalisation of EBMT seen as local enrichment of a translation memory by application of linguistic variations has been proposed. Linguistic variations in the source language and in the target language are extracted from analogical clusters. The selection of the most efficient variations is still an open problem.
The use of penalties to assess the quality of newly created sentence pairs resulting from the enrichment of the case base remains to be evaluated. To merge the direct and the indirect approaches to EBMT, how to associate variations in the source language and variations in the target language (i.e., analogical clusters) will be studied.
Work on self-explanation will continue. Interfaces for the visualisation of the application of variations and
visualisation of the enrichment of the translation memory will be improved. The existing explanations need to be shorter and more easily understandable by a standard user.

次年度使用額が生じた理由

Budget will be spent for participation in an international conference in Europe.

  • 研究成果

    (5件)

すべて 2018 その他

すべて 雑誌論文 (1件) (うち国際共著 1件、 査読あり 1件、 オープンアクセス 1件) 学会発表 (3件) (うち国際学会 2件) 備考 (1件)

  • [雑誌論文] Case-Based Translation: First Steps from a Knowledge-Light Approach Based on Analogy to a Knowledge-Intensive One2018

    • 著者名/発表者名
      Y. Lepage and J. Lieber
    • 雑誌名

      ICCBR 2018, LNAI

      巻: 11156 ページ: 273-288

    • DOI

      https://doi.org/10.1007/978-3-030-01081-2_37

    • 査読あり / オープンアクセス / 国際共著
  • [学会発表] Numerical methods for retrieval and adaptation in Nagao’s EBMT model2018

    • 著者名/発表者名
      K. He, T. Zhao, and Y. Lepage
    • 学会等名
      2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS 2018)
    • 国際学会
  • [学会発表] String transformations preserving analogies2018

    • 著者名/発表者名
      Y. Lepage
    • 学会等名
      2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS 2018)
    • 国際学会
  • [学会発表] Context encoder for analogies on strings2018

    • 著者名/発表者名
      T. Zhao and Y. Lepage
    • 学会等名
      32th Pacific Asia Conference on Language, Information and Computation (PACLIC 32)
  • [備考] Kakenhi Kiban C 18K11447

    • URL

      http://lepage-lab.ips.waseda.ac.jp/en/projects/kakenhi-kiban-c-18k11447/

URL: 

公開日: 2019-12-27  

サービス概要 検索マニュアル よくある質問 お知らせ 利用規程 科研費による研究の帰属

Powered by NII kakenhi