研究課題/領域番号 |
21K12038
|
研究種目 |
基盤研究(C)
|
配分区分 | 基金 |
応募区分 | 一般 |
審査区分 |
小区分61030:知能情報学関連
|
研究機関 | 早稲田大学 |
研究代表者 |
LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
|
研究期間 (年度) |
2021-04-01 – 2024-03-31
|
研究課題ステータス |
完了 (2023年度)
|
配分額 *注記 |
4,030千円 (直接経費: 3,100千円、間接経費: 930千円)
2023年度: 1,040千円 (直接経費: 800千円、間接経費: 240千円)
2022年度: 1,820千円 (直接経費: 1,400千円、間接経費: 420千円)
2021年度: 1,170千円 (直接経費: 900千円、間接経費: 270千円)
|
キーワード | 自然言語処理 / 単語埋め込み表現 / 類推関係 / 推論 / 埋め込み表現 / 類推関係データセット / アルゴリズム / 深層学習 |
研究開始時の研究の概要 |
The most important breakthrough in recent Natural Language Processing (NLP) is vector representations of words or parts of sentences. To assess the quality of vector representations of words, analogy test sets are used (France : Paris :: Japan : x => x = Tokyo). Up to now, the production of such data sets is not automatic. This research will study, explore and release theoretically well-founded methods to automatically extract analogy test sets not only between words but also between parts of sentences, and expectedly, for any language.
|
研究実績の概要 |
The purpose of the research was to address the lack of analogy test sets to evaluate the quality of vector representations of words or sentences. A concern was to examine solutions applicable to various languages. A parallelized version of existing tools for integer-valued string representations (task (c) in the proposal) was produced. It was used to study morphological analysis and generation in many languages. It was used to produce various kinds of sentence analogies in many languages, at formal and semantic levels as features like informativeness were used. However, it was shown that casting integer-valued edit distance ratios into real-valued vector representations is a hard problem. In practice, approximations do not permit to find analogies in word embedding spaces (tasks (a) and (b) in the proposal). So, the project proposed techniques for exhaustive extraction of analogies in word embedding spaces and assessment methods (first sub-problem [EXTRACTALL] in the proposal): produce all possible word analogies that involve words in a given region. To solve analogies between sentences at the semantic level, various neural models were proposed. This led to the production of new sentence semantico-formal or fuzzy analogy test sets not only in English, but also in other languages like Japanese, German or Upper-Sorbian (second sub-problem [SEM&FORM] in the proposal). An important outcome of the project is the discovery of a new formalisation of analogy between non-negative real numbers. This is a very promising direction to explore analogy in vector representations.
|