研究実績の概要 |
During the second year, work on the tasks (a) to (c) have been pursued in parallel. (a) Two approaches were tried to cast vector representations of strings: The first one directly used Parikh vector representations and the second one a one hidden-layer neural network. Recall and precision were measured on various data. (b) Several directions were explored. (b.1) A series of experiments to approximate real-valued vectors to integer-valued vectors was run. Several analogy test sets in several languages were used. The new version of the programs with acceleration, implemented during the first fiscal year, was used. No parallelogram representing analogies between vectors can be discovered in none of the settings. This result has been published in a an international conference. (b.2) Work on casting words from word analogy test sets into their definitions, i.e., sentences, was done. The definitions with the analogical structure induced by the word analogies were used to fine-tune a sentence embedding space with contrastive learning. Such fine-tuned spaces delivered better performance in semantic similarity tasks. (b.3) Programs have been written to automatically extract series of analogies from a subspace around a given word. Preliminary experiments were run on classical examples. The obtained analogies are almost always formal, although they originate from an embedding space built using the distributional hypothesis. (c) Parallelisation of programs is considered as finished in the first fiscal year.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
3: やや遅れている
理由
(a) The fact that no analogy can be found with existing tools has been further explored: several analogy test sets, several languages, and several word embedding spaces, have been used. A paper on these results has been published in an international conference. (b) Although various directions were explored, one with positive outcomes (fine-tuning of a sentence embedding space by relying on word analogies), the modification of existing C programs is needed. Premiminary work on the re-engineering of the C programs has started to identify the places were the use of real values entails modifications by contrast to the use of integers. (c) is considered finished as no new parallelisation could be introduced other than the ones made in the first fiscal year.
|
今後の研究の推進方策 |
In the third fiscal year, work on the tasks (a) and (b) will continue. New computing power in the form of a GPU machine has been acquired. The acquisition of a new GPU card for this new machine will be considered, if buget permits. Work on the problem of casting programs working on integer values to real values will continue. Work in the third fiscal year should profit from the work done during the second fiscal year with verious existing analogy test sets in various languages.
|