Building Named Entity Recognizers by combining a large-scale lexicon and corpora
Project/Area Number |
23700159
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | Tohoku University |
Principal Investigator |
OKAZAKI Naoaki 東北大学, 情報科学研究科, 准教授 (50601118)
|
Project Period (FY) |
2011 – 2012
|
Project Status |
Completed (Fiscal Year 2013)
|
Budget Amount *help |
¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2012: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Fiscal Year 2011: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
|
Keywords | 自然言語処理 / 固有表現抽出 / 情報抽出 / 機械学習 / 知識獲得 |
Research Abstract |
This research builds Named Entity Recognizers, which extract text mentions of entities or concepts of specific semantic classes (e.g., product names and disease names) from text, at a low cost. In order to achieve this goal, this project addresses three challenges: (1) automatic acquisition of training data with mentions annotated with semantic classes; (2) building Named Entity Recognizers from the automatically acquired training data; and (3) evaluating the Named Entity Recognizers. We proposed a method for improving the quality of automatically acquired training data by using reference information in the dictionary, and demonstrated its effectiveness through the experiments. We also proposed a method for mining context gazetteers, which are dependency paths appearing around expressions of the target semantic classes, and confirmed improvements of accuracy of Named Entity Recognizers.
|
Report
(4 results)
Research Products
(39 results)