WebGraph-Analysis of Discrete Structures of the Internet and Development of their Optimization Algorithms
Grant-in-Aid for Scientific Research (C)
|Allocation Type||Single-year Grants|
Fundamental theory of informatics
|Research Institution||Osaka Prefecture University|
UNO Yushi Osaka Prefecture University, Graduate School of Science, Assistant Professor, 理学系研究科, 講師 (60244670)
|Project Period (FY)
2003 – 2006
Completed(Fiscal Year 2006)
|Budget Amount *help
¥3,600,000 (Direct Cost : ¥3,600,000)
Fiscal Year 2006 : ¥500,000 (Direct Cost : ¥500,000)
Fiscal Year 2005 : ¥600,000 (Direct Cost : ¥600,000)
Fiscal Year 2004 : ¥1,000,000 (Direct Cost : ¥1,000,000)
Fiscal Year 2003 : ¥1,500,000 (Direct Cost : ¥1,500,000)
|Keywords||webgraph / data mining / enumeration problem / graph algorithms / community / web algorithms / Webアルゴリズム|
In the explosively evolving Web, by regarding the Web as a huge database, it is extremely important not only to obtain primary information but to find hidden information that cannot be found by naive retrievals. It is often called 'web mining', and web structure mining aims to find hidden communities that share common interests in specified topics in the Web, etc., by focusing on the webgraph that represents the link structure among web pages. On this model, a set of web pages of a community or its core is usually supposed to constitute a dense subgraph or a frequent inherent substructures in the webgraph, and web structure mining is actually realized by extracting them from the webgraph.
As for significant substructures as communities, Kleinberg's hub-authority biclique model is well known and attractive. Some experimental research for this direction try to enumerate (a subset of) bicliques from the webgraph and are successful for mining communities (or their cores). However, since the
re exist potentially enormous number of bicliques, it has become quie hard to carry out an exhaustive enumeration and to have effective outcome in the recent Web.
Our contributions in this series of research is summarize as follows:
(1)We implemented an efficient algorithm for enumerating maximal bicliques from a given graph, and performed an enumeration from the real web data. As a result, we found the structures that are obstacles for exhaustive enumeration of bicliques, and also revealed their semantic meanings.
(2)Instead of the above conventional structures, we adopt a novel new structure called 'isolated cliques' as candidates of communities in the Web. Their definition leads a very efficient algorithm for their enumeration, and it enables us to perform an exhaustive enumeration from the entire Web. As a result, we found that most of isolated cliques reside in single domains and stand for menu structures, which sometimes imply harmful link farm spams. This suggests the effectiveness of isolated cliques as a substructure of the webgraph.
(3)By observing the real webgraph, we found a new frequent substructure of the Web, which we name 'isolated stars'. We designed and implemented an efficient algorithm for their enumeration, and performed an enumeration experiment from the real web data. We also confirmed the effectiveness of isolated stars as a substructure of the web. Less
Research Products (28results)