2004 Fiscal Year Final Research Report Summary
Distributed Data Mining Systems for Structured Web Data
Project/Area Number |
14580423
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | KYUSHU UNIVERSITY |
Principal Investigator |
SHOUDAI Takayoshi Kyushu University, Department of Informatics, Associate Professor, 大学院・システム情報科学研究院, 助教授 (50226304)
|
Co-Investigator(Kenkyū-buntansha) |
MARUYAMA Osamu Kyushu University, Faculty of Mathematics, Associate Professor, 大学院・数理学研究院, 助教授 (20282519)
MIYAHARA Tetsuhiro Hiroshima City University, Faculty of Information Sciences, Associate Professor, 情報科学部, 助教授 (90209932)
UCHIDA Tomoyuki Hiroshima City University, Faculty of Information Sciences, Associate Professor, 情報科学部, 助教授 (70264934)
|
Project Period (FY) |
2002 – 2004
|
Keywords | data mining / machine learning / inductive inference / tree structured data / web mining / metasearch / network algorithm |
Research Abstract |
In this research, we studied knowledge discovery from semistructured Web documents such as HTML/XML files. Graph or tree-based data mining and discovery of frequent structures in graph or tree structured data have been extensively studied. Our target of discovery is neither a simply frequent pattern nor a maximally frequent pattern with respect to syntactic sizes of patterns such as the number of vertices. In order to extract useful information from heterogeneous semistructured Web documents, our target of discovery is a semantically and maximally tree structured pattern which represents a common characteristic in semistructured documents. As a representation of a tree structured pattern, we proposed an ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and internal structured variables. A term tree is different from other representations of tree structured patterns in that a term tree has structured variables which can be substituted by arbitrary trees. First of all, we deeply studied the learnabilities of classes of term tree languages and gave fundamental classes of term tree languages which are polynomial time learnable. We proved that some classes of term tree languages are polynomial time inductively inferable from positive data, which include the class of linear term tree languages with multiple child-port variables, the class of linear term tree languages with contractible variables which are adjacent to leaves, and the class of linear term tree languages with height-constrained variables and no variable chain. Moreover, we showed that some classes of linear term tree languages are exactly learnable in polynomial time using queries. Finally, we presented a metasearch system which uses our efficient learning algorithms for term trees. We implemented this system and showed that it provides an effective unified access to multiple existing search sites.
|
Research Products
(34 results)