Project/Area Number |
13224014
|
Research Category |
Grant-in-Aid for Scientific Research on Priority Areas
|
Allocation Type | Single-year Grants |
Review Section |
Science and Engineering
|
Research Institution | University of Tokyo |
Principal Investigator |
KITSUREGAWA Masaru University of Tokyo, Institute of Industrial Science, Professor, 生産技術研究所, 教授 (40161509)
|
Co-Investigator(Kenkyū-buntansha) |
OGUCHI Masato Ochanomizu Women's University, Department of Science, Associate Professor, 理学部, 助教授 (60328036)
NAKANO Miyuki University of Tokyo, Institute of Industrial Science, Associate Researcher, 生産技術研究所, 助手 (30227863)
|
Project Period (FY) |
2001 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥122,100,000 (Direct Cost: ¥122,100,000)
Fiscal Year 2005: ¥28,000,000 (Direct Cost: ¥28,000,000)
Fiscal Year 2004: ¥28,000,000 (Direct Cost: ¥28,000,000)
Fiscal Year 2003: ¥32,000,000 (Direct Cost: ¥32,000,000)
Fiscal Year 2002: ¥34,100,000 (Direct Cost: ¥34,100,000)
|
Keywords | Computer systems / Internet Performance / Data Storage / SAN connected PC Cluster / Contents Archives / Data Mining / Web Contents / Web Link Analysis / ディスククラスタ / コンテンツ・アーカイブ / ウェブマイニング / ウェブウェアハウス / WWW |
Research Abstract |
WWW contents are very important resources in Japan from the point of country resources view, so these contents are expected to be useful and efficient for our key industries. However, the WWW contents are only utilized for keyword search on commercial search engines at this time. Our research goal is constructing high performance platform which provides the feasible access to Web contents in order to study novel search methods on WWW. So, we researched a novel system architecture which is appropriate for large data intensive processing, intelligent data management methods for searching large collected Web pages and log analyzing methods for providing an efficient utilization of Web-ware houses. 1) A large scale system architecture for Web-ware houses : we proposed the PC cluster systems with many disks connected by SAN ( Storage Area Network), that is, SAN PC cluster. The prototype system is implemented and evaluated with large data mining query. The evaluation results show our proposed system is effective for storing and processing large scale data such as WWW contents. 2) Data management methods for intelligent processing a large volume of WWW pages : we employed a novel approach based on hyper link information amongst WWW pages. This approach is completely different from previous methods. By using the proposed approach, we extracted a community chart from the whole Japanese WWW pages which are collected and stored into our PC cluster system and made a WWW map. Then, we provided a visualization tool supporting to search and show a various relationship among extracted communities. 3) WWW access log analysis methods : we analyzed WWW access log to investigate user's behavior and to reflect analyzed results to systems such as e-commerce site, WWW store and so on. Then, we could extract typical user's behavior pattern from global WWW log ( panel logs ).
|