2017 Fiscal Year Research-status Report
統辞・意味解析情報タグ付き日本語ツリーバンクからの視覚意味情報の抽出と応用
Project/Area Number |
15K02469
|
Research Institution | National Institute for Japanese Language and Linguistics |
Principal Investigator |
バトラー アラステア 大学共同利用機関法人人間文化研究機構国立国語研究所, 大学共同利用機関等の部局等, 研究員 (90588873)
|
Project Period (FY) |
2015-04-01 – 2019-03-31
|
Keywords | semantic dependencies / parsed corpus / visualisation / annotation / predicate arguments / discourse relations |
Outline of Annual Research Achievements |
The research aim has been to develop methods of visualising and making accessible semantic information from analyses of Japanese and English, e.g., predicate argument information, but also higher levels of analysis, such as propositional connectives as well as modals, negation and factors of discourse.
The key part of this work has been the development of a visualisation tool for semantic relationships derivable from a parsed corpus. This enables human annotators to assess whether their interpretations of discourse have been adequately captured by the parsed corpus. As now realised, this tool has the capability of capturing many relationships found in discourse, providing a framework in which a fleshed out account of semantic roles, quantification, and modality becomes feasible.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
The visualisation tool is now being used as a key part in the creation and presentation chain of three corpus resources: the NINJAL Parsed Corpus of Modern Japanese (NPCMJ; (http://npcmj.ninjal.ac.jp), the Oxford-NINJAL Corpus of Old Japanese (ONCOJ; http://oncoj.ninjal.ac.jp/?lang=en), and the Treebank Semantics Parsed Corpus (TSPC; http://www.compling.jp/ajb129/tspc.html).
The developed visualisation tool has revealed layers of dependencies that were not easily visible before. At the same time, the tool has revealed inadequacies of analyses in the present state of the corpus data.
|
Strategy for Future Research Activity |
Until now, two essential components for establishing semantic dependencies (allocation of "sort" information and the specification of clause linkages) have been handled by a small number of specialists who are able to cache out the results of complex grammatical rules (such as involve an antecedent hierarchy) and build these into annotation information without the aid of visualisation tools. Now the project is in a position to turn these tasks over to non-specialists who need only have intuitions about meaningful relationships in texts and enough knowledge to be able to spot whether they are represented in the visualisation or not. Only after reviewing the results of a program of annotation that takes advantage of this new technology can the adequacy of the tool be properly assessed, and the feasibility of including additional layers of semantic information be ascertained. For the remainder of the term of the project the plan is to increase the volume of relevant data by hiring annotators, and to publicise the results of the project domestically and abroad at academic conferences.
|
Causes of Carryover |
The developed visualisation tool has revealed layers of dependencies that were not easily visible before. At the same time, the tool has revealed inadequacies of analyses in the present state of the corpus data.
For the remainder of the term of the project the plan is to increase the volume and quality of relevant data by hiring annotators, and to publicise the results of the project domestically and abroad at academic conferences.
|
Remarks |
The Treebank Semantics Parsed Corpus (TSPC) and Keyaki Treebank are corpus resources that can be viewed and downloaded. Treebank Semantics implements obtaining meaning representations.
|
Research Products
(8 results)