Project/Area Number |
21K12038
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | Waseda University |
Principal Investigator |
LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Project Status |
Completed (Fiscal Year 2023)
|
Budget Amount *help |
¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)
Fiscal Year 2023: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2022: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
|
Keywords | 認知能力 / 類推関係 / 類推関係の徹底的抽出 / 単語埋め込み空間 / 文間類推関係のための神経回路モデル / 実数値間類推関係 / ブール値間類推関係 / 整数値間類推関係 / 自然言語処理 / 単語埋め込み表現 / 推論 / 埋め込み表現 / 類推関係データセット / アルゴリズム / 深層学習 |
Outline of Research at the Start |
The most important breakthrough in recent Natural Language Processing (NLP) is vector representations of words or parts of sentences. To assess the quality of vector representations of words, analogy test sets are used (France : Paris :: Japan : x => x = Tokyo). Up to now, the production of such data sets is not automatic. This research will study, explore and release theoretically well-founded methods to automatically extract analogy test sets not only between words but also between parts of sentences, and expectedly, for any language.
|
Outline of Final Research Achievements |
Recent artificial intelligence uses numbers to represent the meaning of words or sentences. In order to evaluate whether the meaning is correctly represented, analogy datasets are used. However, the construction of analogy datasets has not been automated until now, and those constructed manually in English are biased toward English, even when translated into Japanese, and biaised toward special types of analogical relations. By automatically constructing multilingual analogical datasets, we were able to show that it is useful for regular and irregular word analysis and generation, and to discover new semantic analogical relations between words. From the construction of sentence analogy datasets, we understood which sentence patterns contain more analogical relations. We proposed a paraphrase-based sentence analogy dataset construction method, and also proposed neural circuit models for understanding/solving analogical relations.
|
Academic Significance and Societal Importance of the Research Achievements |
人間の性質な認知行動の一つは、類推関係を認識することである。例えば、「男」:「女」::「王」:何?との質問には「妃」の答えは可能だ。また、「この曲は好き。」:「歌ういたい気分だ。」::「このゲームは好き。」:「プレーする気がする。」は文間の例になる。 最先端人工知能の単語や文の表現では、どの程度その認知能力を持っているか、それを測るために、類推関係データセットが必要とのなる。本研究では単語間と文間類推データセットの構築を検討した。英語だけでなく、多言語可能な手法、さらにある古典的な類推関係だけでなく(性別、国・首都)、より幅広い手法を提案と検討した。
|