2016 Fiscal Year Research-status Report
統辞・意味解析情報タグ付き日本語ツリーバンクからの視覚意味情報の抽出と応用
Project/Area Number |
15K02469
|
Research Institution | National Institute for Japanese Language and Linguistics |
Principal Investigator |
バトラー アラステア 大学共同利用機関法人人間文化研究機構国立国語研究所, 理論・対照研究領域, プロジェクト非常勤研究員 (90588873)
|
Project Period (FY) |
2015-04-01 – 2018-03-31
|
Keywords | コーパス / 日本語 / 意味論 / 統語論 |
Outline of Annual Research Achievements |
The research aims to develop methods of visualising and making accessible semantic information, e.g., predicate argument information, but also higher levels of analysis, such as propositional connectives that distinguish between coordination and subordination of structure. Such information enables, for example, mapping out binding dependencies, which has proved relevant as a method to reconstruct unpronounced argument information (zero pronouns) for Japanese, and extract valence patterns for predicates, an essential part of word meaning.
To carry out this work it has been necessary to continue developing a method for reaching semantic representations automatically from syntactic parsed representations and to create a large base of already analysed and human checked syntactic structures that can be transformed to semantic representations. The establishment of such a base forms training data for creating yet more like data, with the potential to scale to large volumes of data.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
The pipeline for producing analysed data has continued to improve. Models resulting from training are slightly smaller than a year ago despite a large increase in new data, reflecting improvements to the annotation.
The work on developing methods of visualising and making accessible semantic information has focused on ways to embed information back into parsed data. This has led to the enrichment of the existing corpus data with a second layer of special-purpose annotation made up of indexing information. This corpus semantic information can now be searched because of a transformation to the TIGER-XML format that includes a structure sharing mechanism (multi-dominance) that can be queried.
Research results can be seen in the interfaces of the NINJAL Parsed Corpus of Modern Japanese (NPCMJ; http://npcmj.ninjal.ac.jp/interfaces/), where, aside from a default tree view of the syntactic annotation, examples can be seen (semantic view) as predicate logic formulas capturing semantic content, as well as a view (indexed view) that embeds the calculated semantic content into the trees as indexing information. In addition, there is a visualisation for how the semantics was derived (eval view).
|
Strategy for Future Research Activity |
The semantic component will continue to be developed, especially in use as a basis for visualising dependencies. The existing indexing component will be extended so as to produce the character-indexed report format of FrameNet. This will allow creation of browsable reports that display semantic dependencies in a very intuitive way.
A new "scaffolding" component will be built as a layer of automated analysis to further specify part-of-speech analysis derived from systems of morphological analysis (mecab/Comainu). It is expected that additional specification will lead to improvements of the automatic parsing.
The project will also be extending the range of data analysed to more genres and to historical Japanese texts.
|
Causes of Carryover |
Money has been carried over to pay for assistance in the process of undertaking human annotation correction.
|
Expenditure Plan for Carryover Budget |
Money has been carried over to pay for assistance in the process of undertaking human annotation correction.
|
Research Products
(11 results)