Budget Amount *help |
¥3,500,000 (Direct Cost: ¥3,500,000)
Fiscal Year 2001: ¥1,000,000 (Direct Cost: ¥1,000,000)
Fiscal Year 2000: ¥2,500,000 (Direct Cost: ¥2,500,000)
|
Research Abstract |
We studied on the integration of technologies about search engine and data mining and its application to vast amount and heterogeneous genome databases. First, we tried automatic clustering of texts in genome databases based on the index information stored in search engine. However, indices for "words" were not sufficient for precise clustering since the frequency of technical "terms" is essential in scientific text data like genome databases. Then, we constructed a sort of ontology (i.e. controlled collection of technical terms) from genome databases. Finally, we made functional enhancement to our search engine for genome databases by introducing a data mining algorithm called association rule mining, based on the two kinds of relationships among data entries, i.e, link information and ontology. Our system can show link and language information common and specific to a set of data entries (i.e. search result), in which a user is interested, during the iterative search and retrieval against genome databases. This service is provided via WWW, since the response of our system is sufficiently quick.
|