• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Language productivity: fast extraction of productive analogical clusters and their evaluation using statistical machine translation

Research Project

Project/Area Number 15K00317
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Research Field Intelligent informatics
Research InstitutionWaseda University

Principal Investigator

LEPAGE YVES  早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)

Research Collaborator YANG Wei  
FAM Rashel  
SUSANTI GOJALI  
Project Period (FY) 2015-04-01 – 2018-03-31
Project Status Completed (Fiscal Year 2017)
Budget Amount *help
¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Fiscal Year 2017: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords自然言語処理 / 人工知能 / データ構造 / 形態で豊かな言語 / 中国語・日本語
Outline of Final Research Achievements

The goal of the project was 1/ to build tools to produce analogical clusters from monolingual data, 2/ to use such clusters in the production of quasi-parallel corpora, 3/ to use such quasi-parallel corpora in addition to parallel corpora 4/ to obtain improvements in translation accuracy in statistical machine translation (SMT).
Tools were built and publicly released. In addition to what was announced in the research plan, a new data structure, analogical grid was introduced. Data were produced in morphologically poor to rich languages: 11 European languages (N-grams from word to 6-grams), Chinese, Japanese (short sentences of less than 30 characters for SMT experiments), and additional languages (word forms in Arabic, Georgian, Navajo, Russian, Turkish, etc.). Part of the data has been publicly released.
Various experiments showed that it is possible to improve translation accuracy thanks to quasi-parallel data produced by analogy, and filtered, in SMT for Chinese-Japanese.

Report

(4 results)
  • 2017 Annual Research Report   Final Research Report ( PDF )
  • 2016 Research-status Report
  • 2015 Research-status Report
  • Research Products

    (19 results)

All 2018 2017 2016 Other

All Journal Article (2 results) (of which Peer Reviewed: 2 results,  Open Access: 2 results,  Acknowledgement Compliant: 2 results) Presentation (15 results) (of which Invited: 3 results) Remarks (2 results)

  • [Journal Article] Inflating a Small Parallel Corpus into a Large Quasi-parallel Corpus Using Monolingual Data for Chinese-Japanese Machine Translation2017

    • Author(s)
      W. Yang, H. Shen, and Y. Lepage
    • Journal Title

      Journal of Information Processing

      Volume: 25 Issue: 0 Pages: 88-99

    • DOI

      10.2197/ipsjjip.25.88

    • NAID

      130005292406

    • ISSN
      1882-6652
    • Related Report
      2016 Research-status Report
    • Peer Reviewed / Open Access / Acknowledgement Compliant
  • [Journal Article] A method of generating translations of unseen n-grams by using proportional analogy2016

    • Author(s)
      J. Luo and Y. Lepage
    • Journal Title

      IEEJ Transactions in Electronics, Information and Systems

      Volume: 11(3) Issue: 3 Pages: 325-330

    • DOI

      10.1002/tee.22221

    • Related Report
      2016 Research-status Report
    • Peer Reviewed / Open Access / Acknowledgement Compliant
  • [Presentation] Plausibility of word forms generated from analogical grids in Indonesian2018

    • Author(s)
      R. Fam, A. Purwarianti, and Y. Lepage
    • Organizer
      Proceedings of the 16th International Conference on Computer Applications (ICCA 2018), pages 179--184, Yangon, Myanmar, February 2018.
    • Related Report
      2017 Annual Research Report
  • [Presentation] Validating analogically generated Indonesian words using Fisher’s exact test2018

    • Author(s)
      R. Fam and Y. Lepage
    • Organizer
      Proceedings of the 24th Annual Meeting of the Japanese Association for Natural Language Processing, pages 312--315, Okayama, Japan, March 2018.
    • Related Report
      2017 Annual Research Report
  • [Presentation] Automatic Production of Quasi-parallel Corpora for Machine Translation2018

    • Author(s)
      Y. Lepage
    • Organizer
      International Conference on Natural Language, Signal and Speech Processing 2017, Casablanca, Morocco, 06--07 Dec. 2017
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] Quasi-Parallel Corpora: Hallucinating Translations for the Chinese-Japanese Language Pair2018

    • Author(s)
      Y. Lepage
    • Organizer
      BUCC workshop colocated with LREC 2018, Miyazaki, Japan, May 2018
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] Indonesian unseen words explained by form, morphology and distributional semantics at the same time.2017

    • Author(s)
      R. Fam and Y. Lepage
    • Organizer
      言語処理学会第23回年次大会(NLP2017)論文集, pages 178--181.
    • Place of Presentation
      筑波大学
    • Year and Date
      2017-03-14
    • Related Report
      2016 Research-status Report
  • [Presentation] A study in explaining unseen words in Indonesian using analogical clusters2017

    • Author(s)
      R. Fam and Y. Lepage
    • Organizer
      In Proceedings of 15th International Conference on Computer Applications (ICCA 2017), pages 416--421.
    • Place of Presentation
      Yangon, Myanmar
    • Year and Date
      2017-02-16
    • Related Report
      2016 Research-status Report
  • [Presentation] Character-position arithmetic for analogy questions between word forms2017

    • Author(s)
      Y. Lepage
    • Organizer
      Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case-Based Reasoning (ICCBR-17), pages 17--26, Trondheim, Norway, August 2017
    • Related Report
      2017 Annual Research Report
  • [Presentation] A study of the saturation of analogical grids agnostically extracted from texts2017

    • Author(s)
      R. Fam and Y. Lepage
    • Organizer
      Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case-Based Reasoning (ICCBR-17), pages 7--16, Trondheim, Norway, August 2017.
    • Related Report
      2017 Annual Research Report
  • [Presentation] A holistic approach at a morphological inflection task2017

    • Author(s)
      R. Fam and Y. Lepage
    • Organizer
      Proceedings of the 8th Language & Technology Conference (LTC’17), pages 88--92, Poznan, November 2017. Fundacja uniwersytetu im. Adama Mickiewicza.
    • Related Report
      2017 Annual Research Report
  • [Presentation] Confidence of word forms generated in analogical grids2017

    • Author(s)
      P. Liu and Y. Lepage
    • Organizer
      Proceedings of the 11th International collaboration Symposium on Information, Production and Systems (ISIPS 2017), pages 238--240, IPS, Waseda university, nov 2017.
    • Related Report
      2017 Annual Research Report
  • [Presentation] Tools for the production of analogical grids and a resource of n-gram analogical grids in 11 languages2017

    • Author(s)
      R. Fam and Y. Lepage
    • Organizer
      Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC 2018), Miyazaki, Japan, May 2018. (accepted, to appear)
    • Related Report
      2017 Annual Research Report
  • [Presentation] Analogical grids and clusters: assessment with machine translation [in French]2017

    • Author(s)
      Y. Lepage
    • Organizer
      40 ans de traduction automatique, Grenoble, France, July 2017
    • Related Report
      2017 Annual Research Report
    • Invited
  • [Presentation] Production of analogical clusters between marker-based chunks in Chinese and Japanese2016

    • Author(s)
      W. Yang, M. Gao, and Y. Lepage
    • Organizer
      In Proceedings of the 10th International collaboration Symposium on Information, Production and Systems (ISIPS 2016), pages 238--241.
    • Place of Presentation
      北九州
    • Year and Date
      2016-11-09
    • Related Report
      2016 Research-status Report
  • [Presentation] Morphological predictability of unseen words using computational analogy2016

    • Author(s)
      R. Fam and Y. Lepage
    • Organizer
      Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case-Based Reasoning (ICCBR-16), pages 51--60.
    • Place of Presentation
      Atlanta, Georgia, USA.
    • Related Report
      2016 Research-status Report
  • [Presentation] Solving analogical equations between strings of symbols using neural networks2016

    • Author(s)
      V. Kaveeta and Y. Lepage
    • Organizer
      In Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case- Based Reasoning (ICCBR-16), pages 67--76.
    • Place of Presentation
      Atlanta, Georgia, USA.
    • Related Report
      2016 Research-status Report
  • [Remarks] Grants-in-Aid Kakenhi Kiban C 15K00317

    • Related Report
      2017 Annual Research Report
  • [Remarks] Projects / Kakenhi 15K00317 / Experimental results

    • URL

      http://lepage-lab.ips.waseda.ac.jp/index.php/2016-08-01-06-37-56/kakenhi-2/kakenhi-2-experiment-result

    • Related Report
      2016 Research-status Report

URL: 

Published: 2015-04-16   Modified: 2019-03-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi