Simplifying Complicated Sentences for Information Extraction from Text Documents
Project/Area Number |
24500193
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | National Institute of Genetics |
Principal Investigator |
HARA KAZUO 国立遺伝学研究所, 生命情報研究センター, 特任研究員 (30467691)
|
Co-Investigator(Renkei-kenkyūsha) |
OKUBO KOUSAKU 国立遺伝学研究所, 生命情報研究センター, 教授 (40233069)
|
Project Period (FY) |
2012-04-01 – 2016-03-31
|
Project Status |
Completed (Fiscal Year 2015)
|
Budget Amount *help |
¥5,330,000 (Direct Cost: ¥4,100,000、Indirect Cost: ¥1,230,000)
Fiscal Year 2014: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2013: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
Fiscal Year 2012: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
|
Keywords | 単純文化 / 脱文脈化 / 構文解析 / 意味解析 |
Outline of Final Research Achievements |
In discourse contexts, the frequent use of cohesive ties such as reference expressions and coordinated phrases not only troubles the function of automated systems (i.e., natural language parsers) to extract knowledge from the resulting complicated sentences, but also affects the identification of mentions of Named Entities (NEs). We propose to revamp the prose style of anatomical textbooks by transforming cohesive discourse into itemized text, which can be accomplished by annotating reference expressions and coordinating conjunctions. We demonstrate that, compared to the original text, the transformed one is easy for machines to process and hence convenient as a way of identifying mentions of NEs and their relations. Since the transformed text is human readable as well, we believe our approach provides a promising new model for language resources accessible by both human and machine, improving the computational reusability of textbooks.
|
Report
(5 results)
Research Products
(7 results)