Project/Area Number |
18002007
|
Research Category |
Grant-in-Aid for Specially Promoted Research
|
Allocation Type | Single-year Grants |
Review Section |
Science and Engineering
Engineering
|
Research Institution | The University of Tokyo |
Principal Investigator |
TSUJII Junichi The University of Tokyo, 大学院・情報理工学系研究科, 教授 (20026313)
|
Co-Investigator(Kenkyū-buntansha) |
YONEZAWA Akinori 東京大学, 大学院・情報理工学研究科, 教授 (00133116)
TAURA Kenjiro 東京大学, 大学院・情報理工学研究科, 准教授 (90282714)
MIYAO Yusuke 東京大学, 大学院・情報理工学研究科, 助教 (00343096)
MATSUZAKI Takuya 東京大学, 大学院・情報理工学研究科, 助教 (40463872)
|
Research Collaborator |
KANO Yoshinobu 東京大学, 大学院・情報学環, 特任研究員
OHTA Tomoko 東京大学, 大学院・情報学環, 特任研究員
SAETRE Rune 東京大学, 大学院・情報学環, 特任研究員
SHIBATA Takeshi 東京大学, 大学院・情報学環, 特任研究員
MIWA Makoto 東京大学, 大学院・情報学環, 特任研究員
PYYSALO SAMPO Mikael 東京大学, 大学院・情報学環, 特任研究員
KIM Jin-Dong 東京大学, 大学院・情報学環, 特任講師
SAGAE Kenji 東京大学, 大学院・情報理工学系研究科, 特任研究員
SAGAE T. Alicia 東京大学, 大学院・情報理工学系研究科, リサーチアシスタント
WANG Xiangli 東京大学, 大学院・情報理工学系研究科, 特任研究員
TSUNAKAWA Takashi 東京大学, 大学院・情報理工学系研究科, 特任研究員
HARA Tadayoshi 東京大学, 大学院・情報学環, 特任研究員
|
Project Period (FY) |
2006 – 2010
|
Project Status |
Completed (Fiscal Year 2010)
|
Budget Amount *help |
¥499,330,000 (Direct Cost: ¥384,100,000、Indirect Cost: ¥115,230,000)
Fiscal Year 2010: ¥95,030,000 (Direct Cost: ¥73,100,000、Indirect Cost: ¥21,930,000)
Fiscal Year 2009: ¥103,220,000 (Direct Cost: ¥79,400,000、Indirect Cost: ¥23,820,000)
Fiscal Year 2008: ¥101,010,000 (Direct Cost: ¥77,700,000、Indirect Cost: ¥23,310,000)
Fiscal Year 2007: ¥104,910,000 (Direct Cost: ¥80,700,000、Indirect Cost: ¥24,210,000)
Fiscal Year 2006: ¥95,160,000 (Direct Cost: ¥73,200,000、Indirect Cost: ¥21,960,000)
|
Keywords | 言語理解 / 意味処理 / テキストマイニング / 文脈処理 / 知的検索 |
Research Abstract |
The objective of the project was to apply the methodology of combining statistical modeling with structure-based symbolic processing, which had proven successful in sentence parsing, to more challenging tasks such as deep semantic processing, knowledge-based information extraction and contextual processing. We have achieved significant results in (1) efficient and robust deep parsing based on a linguistically sound formalism, (2) a large scale semantically annotated corpus for the biology domain (GENIA corpus), (3) information extraction programs (named entity recognizers and event recognizers) for the biology domain which combine the deep parsing in (1) and structural machine learning algorithms, and (4) Workflow software for data-centered parallel processing. The GENIA corpus in (2) has been recognized as the gold standard corpus for research of text mining for biology and has been used by many groups in the world. It was adopted as the training and test corpus for international shared task competition twice (BioNLP 09 and BioNLP 11). The extraction programs developed in (3) successfully showed the state of the art performance in these international shared task competitions. The system based on (1) and (4) showed that the technology developed by this project was practical for processing the real world text. We successfully processed the whole of MEDLINE (20 million abstracts, more than 2 billion sentences) and indexed them semantically in less than a week. The processing results of MEDLINE has been made publicly available through an intelligent document retrieval system (MEDIE)
|