2017 Fiscal Year Final Research Report

Studies on robust statistical parsing across different domains using word embeddings

Research Project

Project/Area Number	16H06981
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Single-year Grants
Research Field	Intelligent informatics
Research Institution	Nara Institute of Science and Technology
Principal Investigator	Noji Hiroshi 奈良先端科学技術大学院大学, 情報科学研究科, 助教 (00782541)
Project Period (FY)	2016-08-26 – 2018-03-31
Keywords	構文解析 / 組み合わせ範疇文法 / ドメイン適応
Outline of Final Research Achievements	A problem in statistical natural language processing based on machine learning is that a system performs poorly on texts, which come from a different domain than the one of the training data. Since most systems, such as parsers, are trained with annotated data in the newspaper domain, their performance significantly drops on other kinds of texts, e.g., web and scientific papers. Toward more robust parsing method across different domains, we first developed a new simple parser based on Combinatory Categorical Grammar (CCG), which has an advantage that it does not require preprocessing including POS tagging. We also designed a new neural network architecture for parser domain adaptation, and verified the effectiveness of the approach.
Free Research Field	計算言語学