Theoretically founded algorithms for the automatic production of analogy tests in NLP

Research Project

Project/Area Number	21K12038
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Waseda University
Principal Investigator	LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
Project Period (FY)	2021-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2023: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2022: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000) Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Keywords	認知能力 / 類推関係 / 類推関係の徹底的抽出 / 単語埋め込み空間 / 文間類推関係のための神経回路モデル / 実数値間類推関係 / ブール値間類推関係 / 整数値間類推関係 / 自然言語処理 / 単語埋め込み表現 / 推論 / 埋め込み表現 / 類推関係データセット / アルゴリズム / 深層学習
Outline of Research at the Start	The most important breakthrough in recent Natural Language Processing (NLP) is vector representations of words or parts of sentences. To assess the quality of vector representations of words, analogy test sets are used (France : Paris :: Japan : x => x = Tokyo). Up to now, the production of such data sets is not automatic. This research will study, explore and release theoretically well-founded methods to automatically extract analogy test sets not only between words but also between parts of sentences, and expectedly, for any language.
Outline of Final Research Achievements	Recent artificial intelligence uses numbers to represent the meaning of words or sentences. In order to evaluate whether the meaning is correctly represented, analogy datasets are used. However, the construction of analogy datasets has not been automated until now, and those constructed manually in English are biased toward English, even when translated into Japanese, and biaised toward special types of analogical relations. By automatically constructing multilingual analogical datasets, we were able to show that it is useful for regular and irregular word analysis and generation, and to discover new semantic analogical relations between words. From the construction of sentence analogy datasets, we understood which sentence patterns contain more analogical relations. We proposed a paraphrase-based sentence analogy dataset construction method, and also proposed neural circuit models for understanding/solving analogical relations.
Academic Significance and Societal Importance of the Research Achievements	人間の性質な認知行動の一つは、類推関係を認識することである。例えば、「男」:「女」::「王」:何？との質問には「妃」の答えは可能だ。また、「この曲は好き。」:「歌ういたい気分だ。」::「このゲームは好き。」:「プレーする気がする。」は文間の例になる。最先端人工知能の単語や文の表現では、どの程度その認知能力を持っているか、それを測るために、類推関係データセットが必要とのなる。本研究では単語間と文間類推データセットの構築を検討した。英語だけでなく、多言語可能な手法、さらにある古典的な類推関係だけでなく（性別、国・首都）、より幅広い手法を提案と検討した。

Report

(4 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Research-status Report
2021 Research-status Report

Research Products
(26 results)

All 2024 2023 2022 2021 Other

All Journal Article (7 results) (of which Int'l Joint Research: 6 results, Peer Reviewed: 6 results, Open Access: 3 results) Presentation (17 results) (of which Int'l Joint Research: 11 results, Invited: 6 results) Remarks (2 results)

[Journal Article] A study of universal morphological analysis using morpheme-based, holistic, and neural approaches under various data size conditions2024
- Author(s)
  R. Fam and Y. Lepage
- Journal Title
  
  Annals of Mathematics and Artificial Intelligence
  
  Volume: To appear
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Learning from masked analogies between sentences at multiple levels of formality2023
- Author(s)
  Wang Liyan、Lepage Yves
- Journal Title
  
  Annals of Mathematics and Artificial Intelligence
  
  Volume: 93 Issue: 2 Pages: 237-261
- DOI
  10.1007/s10472-023-09918-2
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] A study in the generation of multilingually aligned middle sentences2023
- Author(s)
  M. Eget, X. Yang, and Y. Lepage
- Journal Title
  
  Proceedings of the 10th Language & Technology Conference (LTC 2023) & Human Language Technologies as a Challenge for Computer Science and Linguistics
  
  Volume: 0 Pages: 45-49
- Related Report
  2022 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Investigating parallelograms: Assessing several word embedding spaces against various analogy test sets in several languages using approximation2023
- Author(s)
  R. Fam and Y. Lepage
- Journal Title
  
  Proceedings of the 10th Language & Technology Conference (LTC 2023) & Human Language Technologies as a Challenge for Computer Science and Linguistics
  
  Volume: 0 Pages: 68-72
- Related Report
  2022 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Solving sentence analogies by using embedding spaces combined with a vector-to-sequence decoder or by fine-tuning pre-trained language models2023
- Author(s)
  L. Wang, Z. Pang, H. Wang, X. Zhao, and Y. Lepage
- Journal Title
  
  Proceedings of the 10th Language & Technology Conference (LTC 2023) & Human Language Technologies as a Challenge for Computer Science and Linguistics
  
  Volume: 0 Pages: 325-330
- Related Report
  2022 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Organising lexica into analogical grids: a study of a holistic approach for morphological generation under various sizes of data in various languages2022
- Author(s)
  Fam Rashel、Lepage Yves
- Journal Title
  
  Journal of Experimental & Theoretical Artificial Intelligence
  
  Volume: 0 Pages: 1-26
- Related Report
  2022 Research-status Report
[Journal Article] A Study of Analogical Density in Various Corpora at Various Granularity2021
- Author(s)
  Fam Rashel、Lepage Yves
- Journal Title
  
  Information
  
  Volume: 12 Issue: 8 Pages: 314-314
- DOI
  10.3390/info12080314
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Analogie et moyenne generalisee2024
- Author(s)
  Y. Lepage and M. Couceiro
- Organizer
  In Actes de la conference Journees d'intelligence artificielle francaises -- Plateforme francaise d'intelligence artificielle (PFIA-JIAF 2024) (Accepted, to appear)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Continued pre-training on sentence analogies for translation with small data2024
- Author(s)
  L. Wang, H. Wang, and Y. Lepage
- Organizer
  LREC-COLING 2024 (to appear)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] A study in the generation of multilingually aligned middle sentences2023
- Author(s)
  M. Eget, X. Yang, and Y. Lepage
- Organizer
  Proceedings of the 10th Language & Technology Conference (LTC 2023) -- Human Language Technologies as a Challenge for Computer Science and Linguistics, pages 45--49, April 2023.
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Investigating parallelograms: Assessing several word embedding spaces against various analogy test sets in several languages using approximation2023
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  Proceedings of the 10th Language & Technology Conference (LTC 2023) -- Human Language Technologies as a Challenge for Computer Science and Linguistics, pages 68--72, April 2023.
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Solving sentence analogies by using embedding spaces combined with a vector-to-sequence decoder or by fine-tuning pre-trained language models2023
- Author(s)
  L. Wang, Z. Pang, H. Wang, X. Zhao, and Y. Lepage
- Organizer
  Proceedings of the 10th Language & Technology Conference (LTC 2023) -- Human Language Technologies as a Challenge for Computer Science and Linguistics, pages 325--330, April 2023.
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Resolution of analogies between strings in the case of multiple solutions2023
- Author(s)
  X. Deng and Y. Lepage
- Organizer
  In CEUR, editor, Proceedings of ICCBR: Workshop on Analogies: from Theory to Applications (ATA@ICCBR 2023), CEUR Workshop Proceedings, pages 3-14, July 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Embedding-to-embedding method based on autoencoder for solving sentence analogies2023
- Author(s)
  W. Mao and Y. Lepage
- Organizer
  In CEUR, editor, Proceedings of ICCBR: Workshop on Analogies: from Theory to Applications (ATA@ICCBR 2023), CEUR Workshop Proceedings, pages 15-26, July 2023.
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Improving sentence embedding with sentence relationships from word analogies2023
- Author(s)
  Q. Zhang and Y. Lepage
- Organizer
  In CEUR, editor, Proceedings of ICCBR: Workshop on Analogies: from Theory to Applications (ATA@ICCBR 2023), CEUR Workshop Proceedings, pages 43-53, July 2023.
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Formulae for the solution of an analogical equation between Booleans using the Sheffer stroke (NAND) or the Pierce arrow (NOR)2023
- Author(s)
  Y. Lepage
- Organizer
  Proceedings of the Workshop Interactions between analogies and machine learning, colocated with IJCAI 2023 (IARML@IJCAI 2023), pages 3-14, August 2023.
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] A framework for neural machine translation by fuzzy analogies2023
- Author(s)
  L. Wang, B. Wloka, and Y. Lepage
- Organizer
  Proceedings of the Workshop Interactions between analogies and machine learning, colocated with IJCAI 2023 (IARML@IJCAI 2023), pages 47-55, August 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Analogie et donnees de langue (Analogy and language data, in French)2023
- Author(s)
  Y. Lepage
- Organizer
  Colloquium LORIA, 15 nov. 2023, LORIA, Nancy, France
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] Analogie, explication des donnees de langue et travaux recents sur representations vectorielles de phrases et analogie2023
- Author(s)
  Y. Lepage
- Organizer
  Workshop Analogies: From learning to explainability, 27-28 nov. 2023, Arras, France
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] Analogie et moyenne : considerations generales et application aux chaines (Analogy and means: general considerations and applications to strings, in French)2023
- Author(s)
  Y. Lepage
- Organizer
  Forum sciences cognitives et traitement automatique des langues, 29 nov. 2023, Nancy, France
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] Jeux d'analogies pour le TAL (Analogy test sets for NLP, in French)2023
- Author(s)
  Y. Lepage
- Organizer
  MALOTEC/LORIA seminar, 13 dec. 2023, Nancy, France
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] Investigating parallelograms inside word embedding space using various analogy test sets in various languages2023
- Author(s)
  R. Fam and Y. Lepage
- Organizer
  言語処理学会第29回年次大会発表論文集,、那覇、718--722
- Related Report
  2023 Annual Research Report
[Presentation] Giving a structure to language data: from analogies to analogical grids.2022
- Author(s)
  Yves Lepage
- Organizer
  Invited talk at the seminar of Dublin City University (DCU), 4th of July 2022.
- Related Report
  2022 Research-status Report
- Invited
[Presentation] Analogy on text data2022
- Author(s)
  Yves Lepage
- Organizer
  Invited talk at the workshop Interaction between Analogical Reasoning and Machine Learning (IARML 2022), 23rd of July 2022.
- Related Report
  2022 Research-status Report
- Int'l Joint Research / Invited
[Remarks] Kakenhi Project 21K12038
- URL
  http://lepage-lab.ips.waseda.ac.jp/projects/Kakenhi_Project_21K12038/
- Related Report
  2023 Annual Research Report
[Remarks] Kakenhi Kiban C 18K11447
- URL
  http://lepage-lab.ips.waseda.ac.jp/en/projects/kakenhi-kiban-c-18k11447/
- Related Report
  2022 Research-status Report

Theoretically founded algorithms for the automatic production of analogy tests in NLP

Principal Investigator

LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)

¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)

Report

Research Products

[Journal Article] A study of universal morphological analysis using morpheme-based, holistic, and neural approaches under various data size conditions2024

Author(s)

Journal Title

Related Report

[Journal Article] Learning from masked analogies between sentences at multiple levels of formality2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A study in the generation of multilingually aligned middle sentences2023

Author(s)

Journal Title

Related Report

[Journal Article] Investigating parallelograms: Assessing several word embedding spaces against various analogy test sets in several languages using approximation2023

Author(s)

Journal Title

Related Report

[Journal Article] Solving sentence analogies by using embedding spaces combined with a vector-to-sequence decoder or by fine-tuning pre-trained language models2023

Author(s)

Journal Title

Related Report

[Journal Article] Organising lexica into analogical grids: a study of a holistic approach for morphological generation under various sizes of data in various languages2022

Author(s)

Journal Title

Related Report

[Journal Article] A Study of Analogical Density in Various Corpora at Various Granularity2021

Author(s)

Journal Title

DOI

Related Report

[Presentation] Analogie et moyenne generalisee2024

Author(s)

Organizer

Related Report

[Presentation] Continued pre-training on sentence analogies for translation with small data2024

Author(s)

Organizer

Related Report

[Presentation] A study in the generation of multilingually aligned middle sentences2023

Author(s)

Organizer

Related Report

[Presentation] Investigating parallelograms: Assessing several word embedding spaces against various analogy test sets in several languages using approximation2023

Author(s)

Organizer

Related Report

[Presentation] Solving sentence analogies by using embedding spaces combined with a vector-to-sequence decoder or by fine-tuning pre-trained language models2023

Author(s)

Organizer

Related Report

[Presentation] Resolution of analogies between strings in the case of multiple solutions2023

Author(s)

Organizer

Related Report

[Presentation] Embedding-to-embedding method based on autoencoder for solving sentence analogies2023

Author(s)

Organizer

Related Report

[Presentation] Improving sentence embedding with sentence relationships from word analogies2023

Author(s)

Organizer

Related Report

[Presentation] Formulae for the solution of an analogical equation between Booleans using the Sheffer stroke (NAND) or the Pierce arrow (NOR)2023

Author(s)

Organizer

Related Report

[Presentation] A framework for neural machine translation by fuzzy analogies2023

Author(s)

Organizer

Related Report

[Presentation] Analogie et donnees de langue (Analogy and language data, in French)2023

Author(s)

Organizer

Related Report