Project/Area Number |
13680459
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Hiroshima City University |
Principal Investigator |
MIYAHARA Tetsuhiro Hiroshima City University, Faculty of Information Sciences, Associate Professor, 情報科学部, 助教授 (90209932)
|
Co-Investigator(Kenkyū-buntansha) |
KUBOYAMA Tetsuji The University of Tokyo, Center for Collaborative Research, Research Associate, 国際産学共同研究センター, 助手 (80302660)
SHOUDAI Takayoshi Kyushu University, Department of Informatics, Associate Professor, 大学院・システム情報科学研究院, 助教授 (50226304)
UCHIDA Tomoyuki Hiroshima City University, Faculty of Information Sciences, Associate Professor, 情報科学部, 助教授 (70264934)
|
Project Period (FY) |
2001 – 2003
|
Project Status |
Completed (Fiscal Year 2003)
|
Budget Amount *help |
¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2003: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 2002: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 2001: ¥1,800,000 (Direct Cost: ¥1,800,000)
|
Keywords | data mining / knowledge discovery / graph-structured data / semistructured data / tree structured pattern / HTML / XML file / 帰納推論 |
Research Abstract |
The purpose of this research project is to give theoretical foundations of data mining systems from graph-structured data or tree-structured data. Recently, Web documents such as HTML files and XIML files have increased rapidly. Such Web documents have no rigid structure and are called semistructured data. In general, such semistructured Web documents are represented by rooted trees. We have proposed methods for discovering frequent tree structured patterns in semistructured Web documents by using a tag tree pattern as a hypothesis. A tag tree pattern is an edge labeled tree which has ordered or unordered children and structured variables. An edge label is a tag or a keyword in such Web documents, and a variable can be substituted by an arbitrary tree. So a tag tree pattern is suited for representing tree structured patterns in such Web documents. Information Extraction from semistructured data becomes more and more important. In order to extract meaningful or interesting contents from semistructured data, we need to extract common structured patterns from semistructured data. We have presented a method for extracting characteristic tag tree patterns from irregular semistructured data by using an algorithm for finding a minimally generalized tag tree pattern explaining given data. Also we have given various learning algorithms of term trees, which are tree structured patterns with structured variables, from tree structured data, since such learning algorithms give theoretical foundations of data mining from semistructured data.
|