研究実績の概要 |
The research aims to develop methods of visualising and making accessible semantic information, e.g., predicate argument information, but also higher levels of analysis, such as propositional connectives that distinguish between coordination and subordination of structure. Such information enables, for example, mapping out binding dependencies, which has proved relevant as a method to reconstruct unpronounced argument information (zero pronouns) for Japanese, and extract valence patterns for predicates, an essential part of word meaning.
To carry out this work it has been necessary to continue developing a method for reaching semantic representations automatically from syntactic parsed representations and to create a large base of already analysed and human checked syntactic structures that can be transformed to semantic representations. The establishment of such a base forms training data for creating yet more like data, with the potential to scale to large volumes of data.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
1: 当初の計画以上に進展している
理由
The pipeline for producing analysed data has continued to improve. Models resulting from training are slightly smaller than a year ago despite a large increase in new data, reflecting improvements to the annotation.
The work on developing methods of visualising and making accessible semantic information has focused on ways to embed information back into parsed data. This has led to the enrichment of the existing corpus data with a second layer of special-purpose annotation made up of indexing information. This corpus semantic information can now be searched because of a transformation to the TIGER-XML format that includes a structure sharing mechanism (multi-dominance) that can be queried.
Research results can be seen in the interfaces of the NINJAL Parsed Corpus of Modern Japanese (NPCMJ; http://npcmj.ninjal.ac.jp/interfaces/), where, aside from a default tree view of the syntactic annotation, examples can be seen (semantic view) as predicate logic formulas capturing semantic content, as well as a view (indexed view) that embeds the calculated semantic content into the trees as indexing information. In addition, there is a visualisation for how the semantics was derived (eval view).
|