Language productivity: fast extraction of productive analogical clusters and their evaluation using statistical machine translation

Research Project

Project/Area Number	15K00317
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Intelligent informatics
Research Institution	Waseda University
Principal Investigator	LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
Research Collaborator	YANG Wei FAM Rashel SUSANTI GOJALI
Project Period (FY)	2015-04-01 – 2018-03-31
Project Status	Completed (Fiscal Year 2017)
Budget Amount *help	¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000) Fiscal Year 2017: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000) Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords	自然言語処理 / 人工知能 / データ構造 / 形態で豊かな言語 / 中国語・日本語
Outline of Final Research Achievements	The goal of the project was 1/ to build tools to produce analogical clusters from monolingual data, 2/ to use such clusters in the production of quasi-parallel corpora, 3/ to use such quasi-parallel corpora in addition to parallel corpora 4/ to obtain improvements in translation accuracy in statistical machine translation (SMT). Tools were built and publicly released. In addition to what was announced in the research plan, a new data structure, analogical grid was introduced. Data were produced in morphologically poor to rich languages: 11 European languages (N-grams from word to 6-grams), Chinese, Japanese (short sentences of less than 30 characters for SMT experiments), and additional languages (word forms in Arabic, Georgian, Navajo, Russian, Turkish, etc.). Part of the data has been publicly released. Various experiments showed that it is possible to improve translation accuracy thanks to quasi-parallel data produced by analogy, and filtered, in SMT for Chinese-Japanese.

Report

(4 results)

2017 Annual Research Report Final Research Report ( PDF )
2016 Research-status Report
2015 Research-status Report

Research Products
(19 results)

All 2018 2017 2016 Other

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 2 results, Acknowledgement Compliant: 2 results) Presentation (15 results) (of which Invited: 3 results) Remarks (2 results)

[Journal Article] Inflating a Small Parallel Corpus into a Large Quasi-parallel Corpus Using Monolingual Data for Chinese-Japanese Machine Translation2017
- Author(s)
  W. Yang, H. Shen, and Y. Lepage
- Journal Title
  
  Journal of Information Processing
  
  Volume: 25 Issue: 0 Pages: 88-99
- DOI
  10.2197/ipsjjip.25.88
- NAID
  130005292406
- ISSN
  1882-6652
- Related Report
  2016 Research-status Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] A method of generating translations of unseen n-grams by using proportional analogy2016
- Author(s)
  J. Luo and Y. Lepage
- Journal Title
  
  IEEJ Transactions in Electronics, Information and Systems
  
  Volume: 11(3) Issue: 3 Pages: 325-330
- DOI
  10.1002/tee.22221
- Related Report
  2016 Research-status Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Presentation] Plausibility of word forms generated from analogical grids in Indonesian2018
- Author(s)
  R. Fam, A. Purwarianti, and Y. Lepage
- Organizer
  Proceedings of the 16th International Conference on Computer Applications (ICCA 2018), pages 179--184, Yangon, Myanmar, February 2018.
- Related Report
  2017 Annual Research Report
[Presentation] Validating analogically generated Indonesian words using Fisher’s exact test2018
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  Proceedings of the 24th Annual Meeting of the Japanese Association for Natural Language Processing, pages 312--315, Okayama, Japan, March 2018.
- Related Report
  2017 Annual Research Report
[Presentation] Automatic Production of Quasi-parallel Corpora for Machine Translation2018
- Author(s)
  Y. Lepage
- Organizer
  International Conference on Natural Language, Signal and Speech Processing 2017, Casablanca, Morocco, 06--07 Dec. 2017
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] Quasi-Parallel Corpora: Hallucinating Translations for the Chinese-Japanese Language Pair2018
- Author(s)
  Y. Lepage
- Organizer
  BUCC workshop colocated with LREC 2018, Miyazaki, Japan, May 2018
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] Indonesian unseen words explained by form, morphology and distributional semantics at the same time.2017
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  言語処理学会第23回年次大会(NLP2017)論文集, pages 178--181.
- Place of Presentation
  筑波大学
- Year and Date
  2017-03-14
- Related Report
  2016 Research-status Report
[Presentation] A study in explaining unseen words in Indonesian using analogical clusters2017
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  In Proceedings of 15th International Conference on Computer Applications (ICCA 2017), pages 416--421.
- Place of Presentation
  Yangon, Myanmar
- Year and Date
  2017-02-16
- Related Report
  2016 Research-status Report
[Presentation] Character-position arithmetic for analogy questions between word forms2017
- Author(s)
  Y. Lepage
- Organizer
  Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case-Based Reasoning (ICCBR-17), pages 17--26, Trondheim, Norway, August 2017
- Related Report
  2017 Annual Research Report
[Presentation] A study of the saturation of analogical grids agnostically extracted from texts2017
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case-Based Reasoning (ICCBR-17), pages 7--16, Trondheim, Norway, August 2017.
- Related Report
  2017 Annual Research Report
[Presentation] A holistic approach at a morphological inflection task2017
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  Proceedings of the 8th Language & Technology Conference (LTC’17), pages 88--92, Poznan, November 2017. Fundacja uniwersytetu im. Adama Mickiewicza.
- Related Report
  2017 Annual Research Report
[Presentation] Confidence of word forms generated in analogical grids2017
- Author(s)
  P. Liu and Y. Lepage
- Organizer
  Proceedings of the 11th International collaboration Symposium on Information, Production and Systems (ISIPS 2017), pages 238--240, IPS, Waseda university, nov 2017.
- Related Report
  2017 Annual Research Report
[Presentation] Tools for the production of analogical grids and a resource of n-gram analogical grids in 11 languages2017
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC 2018), Miyazaki, Japan, May 2018. (accepted, to appear)
- Related Report
  2017 Annual Research Report
[Presentation] Analogical grids and clusters: assessment with machine translation [in French]2017
- Author(s)
  Y. Lepage
- Organizer
  40 ans de traduction automatique, Grenoble, France, July 2017
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] Production of analogical clusters between marker-based chunks in Chinese and Japanese2016
- Author(s)
  W. Yang, M. Gao, and Y. Lepage
- Organizer
  In Proceedings of the 10th International collaboration Symposium on Information, Production and Systems (ISIPS 2016), pages 238--241.
- Place of Presentation
  北九州
- Year and Date
  2016-11-09
- Related Report
  2016 Research-status Report
[Presentation] Morphological predictability of unseen words using computational analogy2016
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case-Based Reasoning (ICCBR-16), pages 51--60.
- Place of Presentation
  Atlanta, Georgia, USA.
- Related Report
  2016 Research-status Report
[Presentation] Solving analogical equations between strings of symbols using neural networks2016
- Author(s)
  V. Kaveeta and Y. Lepage
- Organizer
  In Proceedings of the Computational Analogy Workshop at the 24th International Conference on Case- Based Reasoning (ICCBR-16), pages 67--76.
- Place of Presentation
  Atlanta, Georgia, USA.
- Related Report
  2016 Research-status Report
[Remarks] Grants-in-Aid Kakenhi Kiban C 15K00317
- Related Report
  2017 Annual Research Report
[Remarks] Projects / Kakenhi 15K00317 / Experimental results
- URL
  http://lepage-lab.ips.waseda.ac.jp/index.php/2016-08-01-06-37-56/kakenhi-2/kakenhi-2-experiment-result
- Related Report
  2016 Research-status Report

Language productivity: fast extraction of productive analogical clusters and their evaluation using statistical machine translation

Principal Investigator

LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)

¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)

Report

Research Products

[Journal Article] Inflating a Small Parallel Corpus into a Large Quasi-parallel Corpus Using Monolingual Data for Chinese-Japanese Machine Translation2017

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] A method of generating translations of unseen n-grams by using proportional analogy2016

Author(s)

Journal Title

DOI

Related Report

[Presentation] Plausibility of word forms generated from analogical grids in Indonesian2018

Author(s)

Organizer

Related Report

[Presentation] Validating analogically generated Indonesian words using Fisher’s exact test2018

Author(s)

Organizer

Related Report

[Presentation] Automatic Production of Quasi-parallel Corpora for Machine Translation2018

Author(s)

Organizer

Related Report

[Presentation] Quasi-Parallel Corpora: Hallucinating Translations for the Chinese-Japanese Language Pair2018

Author(s)

Organizer

Related Report

[Presentation] Indonesian unseen words explained by form, morphology and distributional semantics at the same time.2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] A study in explaining unseen words in Indonesian using analogical clusters2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Character-position arithmetic for analogy questions between word forms2017

Author(s)

Organizer

Related Report

[Presentation] A study of the saturation of analogical grids agnostically extracted from texts2017

Author(s)

Organizer

Related Report

[Presentation] A holistic approach at a morphological inflection task2017

Author(s)

Organizer

Related Report

[Presentation] Confidence of word forms generated in analogical grids2017

Author(s)

Organizer

Related Report

[Presentation] Tools for the production of analogical grids and a resource of n-gram analogical grids in 11 languages2017

Author(s)

Organizer

Related Report

[Presentation] Analogical grids and clusters: assessment with machine translation [in French]2017

Author(s)

Organizer

Related Report

[Presentation] Production of analogical clusters between marker-based chunks in Chinese and Japanese2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Morphological predictability of unseen words using computational analogy2016

Author(s)

Organizer

Place of Presentation