2013 Fiscal Year Final Research Report
Building Named Entity Recognizers by combining a large-scale lexicon and corpora
Project/Area Number |
23700159
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | Tohoku University |
Principal Investigator |
OKAZAKI Naoaki 東北大学, 情報科学研究科, 准教授 (50601118)
|
Project Period (FY) |
2011 – 2012
|
Keywords | 自然言語処理 / 固有表現抽出 |
Research Abstract |
This research builds Named Entity Recognizers, which extract text mentions of entities or concepts of specific semantic classes (e.g., product names and disease names) from text, at a low cost. In order to achieve this goal, this project addresses three challenges: (1) automatic acquisition of training data with mentions annotated with semantic classes; (2) building Named Entity Recognizers from the automatically acquired training data; and (3) evaluating the Named Entity Recognizers. We proposed a method for improving the quality of automatically acquired training data by using reference information in the dictionary, and demonstrated its effectiveness through the experiments. We also proposed a method for mining context gazetteers, which are dependency paths appearing around expressions of the target semantic classes, and confirmed improvements of accuracy of Named Entity Recognizers.
|
Research Products
(13 results)