2006 Fiscal Year Final Research Report Summary
Research on Efficient and Practical Semi-Structured Data Mining Techniques Applicable to WWW
Project/Area Number |
16300030
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Media informatics/Database
|
Research Institution | Shizuoka University (2006) Tokyo Metropolitan University (2004-2005) |
Principal Investigator |
ISHIKAWA Hiroshi Shizuoka University, Faculty of Informatics, Professor, 情報学部, 教授 (60326014)
|
Co-Investigator(Kenkyū-buntansha) |
KATAYAMA Kaoru Tokyo Metropolitan University, Graduate School of System Design, Associate Professor, システムデザイン研究科, 准教 (00336520)
|
Project Period (FY) |
2004 – 2006
|
Keywords | database / knowledge discovery / semi-structured data / information system / informatics / XML / subgraph retrieval / subgraph isomorphism |
Research Abstract |
We developed graph mining techniques and evaluated and improved them as follows. Graph is applied to various fields like not only informatics like Pattern Recognition but also chemistry and biology, etc. It is important to find out the graphs which include the given input graph efficiently in graph databases. However, the Subgraph Isomorphism problem is known to be NP-complete. In order to increase efficiency, we investigated a method which removes the graphs not containing the input graph before proceeding it. The Interlace theorem represents relations of eigenvalues in matrix. Exploiting this theorem for graph, we showed that we can find out the graph which is not induced subgraph in graph set. We proposed the retrieving method for graphs using both a data structure based eigenvalues and this theorem. We also evaluated and validated our proposition by several experiments such as comparing it with VF2. Messmer et al. proposed the algorithm to efficiently solve the Subgraph Isomorphism b
… More
ased on graph decomposition. However, when unconnected graph is generated by the decomposition and the cost of computation increases very much. We proposed the algorithm to connect unconnected graphs in order not to reduce the size of shared subgraphs. For evaluating performance of a system for graph, a test data consisting of diverse graphs is needed. Then we developed random algorithms for generating unlabelled graphs with n vertices. The word "random" means that selecting a graph from a set of all unlabelled graphs with n vertices uniformly at random. Dixon and Wilf showed efficient algorithm for this purpose, but their algorithm needs Gn, the number of all unlabelled graphs with n vertices, for input and it is hard to calculate. We suggested an algorithm based on Dixon and Wilfs algorithm which can eliminate Gn but can not assure completely random generation of unlabelled graphs. We also validated our approach by comparing it with VF2. Gupta et al. 's XPush machine has a serious problem that its reconstruction degrades its throughput because it depends on the total number of XML filters. We solved to devise an integrated XPush machine based on subXPush machines to allow partial exchange of filters. We also proposed methods for controlling access to XML databases and XML data mining based on the statistics of the data. Less
|
Research Products
(57 results)