Project/Area Number |
12208001
|
Research Category |
Grant-in-Aid for Scientific Research on Priority Areas
|
Allocation Type | Single-year Grants |
Review Section |
Biological Sciences
|
Research Institution | The University of Tokyo |
Principal Investigator |
TAKAGI Toshihisa The University of Tokyo, Graduate School of Frontier Sciences, Professor, 大学院新領域創成科学研究科, 教授 (30110836)
|
Co-Investigator(Kenkyū-buntansha) |
TSUJII Junichi The University of Tokyo, Interfaculty Initiative in Information Studies, Professor, 大学院情報学環, 教授 (20026313)
TAKAI Takako The University of Tokyo, Graduate School of Information Science and Technology, Project Assistant Professor, 大学院情報理工学系研究科, 科学技術振興特任教員 (60222840)
FUKUDA Kenichiro National Institute of Advanced Industrial Science and Technology, Computational Biology Research Center, Research Scientist, 生命情報科学研究センター, 研究員 (10357890)
KOIKE Asako Hitachi Ltd., Central Research Laboratory, life Science Research Laboratory, Senior Research Scientist, 中央研究所ライフサイエンスセンター, 主任研究員
|
Project Period (FY) |
2000 – 2004
|
Project Status |
Completed (Fiscal Year 2004)
|
Budget Amount *help |
¥183,200,000 (Direct Cost: ¥183,200,000)
Fiscal Year 2004: ¥36,000,000 (Direct Cost: ¥36,000,000)
Fiscal Year 2003: ¥36,000,000 (Direct Cost: ¥36,000,000)
Fiscal Year 2002: ¥46,200,000 (Direct Cost: ¥46,200,000)
Fiscal Year 2001: ¥65,000,000 (Direct Cost: ¥65,000,000)
|
Keywords | ontology / genome databases / signal transduction / information extraction from literature / natural language processing / tagged corpus / gene dictionary / pathway databases / キナーゼデータベース |
Research Abstract |
It is indispensable to develop databases of gene and protein interactions and their functions extracted from literature so that we can systematically understand lives based on flood of biological data such as genome sequences, gene expressions, and interactions between molecules. From this perspective, we have been tackling two challenges, that is, 1) automatically extracting knowledge of biological functions from literature and 2) representing and utilizing the extracted knowledge on computers. Followings are brief descriptions of our efforts. a)We developed a knowledge extraction system. We almost established a method of extracting information of gene / protein / chemical compounds interaction from literature. Our system achieved a recall of about 50 % and a precision of about 90 %. b)We developed dictionaries of gene names and gene family names that are used for identifying those names in literature. GENA, one of the dictionaries, stores about 880,000-gene names and, depending on organisms, covers 90-95 % of all the genes appearing in literature). By using the dictionaries and the above mentioned extraction system, we developed and published an interaction database called PRIME and a dictionary of biological functional terms. PRIME stores about three million interactions of six eukaryotes such as human and rat. c)We prepared a corpus and an ontology for knowledge extraction. To develop and evaluate a knowledge extraction system, a tagged corpus and an ontology of defining domain specific terms are needed. We, therefore, developed and published the GENIA corpus that is composed from 2,000 MEDLINE abstracts whose terms are given semantic and part-of-speech tags accordingly. In addition, we developed the GENIA ontology to be used for adding semantic tags to terms in literature.
|