• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2005 Fiscal Year Final Research Report Summary

Constructing the Web-ware house for Web Mining

Research Project

Project/Area Number 13224014
Research Category

Grant-in-Aid for Scientific Research on Priority Areas

Allocation TypeSingle-year Grants
Review Section Science and Engineering
Research InstitutionUniversity of Tokyo

Principal Investigator

KITSUREGAWA Masaru  University of Tokyo, Institute of Industrial Science, Professor, 生産技術研究所, 教授 (40161509)

Co-Investigator(Kenkyū-buntansha) OGUCHI Masato  Ochanomizu Women's University, Department of Science, Associate Professor, 理学部, 助教授 (60328036)
NAKANO Miyuki  University of Tokyo, Institute of Industrial Science, Associate Researcher, 生産技術研究所, 助手 (30227863)
Project Period (FY) 2001 – 2005
KeywordsComputer systems / Internet Performance / Data Storage / SAN connected PC Cluster / Contents Archives / Data Mining / Web Contents / Web Link Analysis
Research Abstract

WWW contents are very important resources in Japan from the point of country resources view, so these contents are expected to be useful and efficient for our key industries. However, the WWW contents are only utilized for keyword search on commercial search engines at this time.
Our research goal is constructing high performance platform which provides the feasible access to Web contents in order to study novel search methods on WWW. So, we researched a novel system architecture which is appropriate for large data intensive processing, intelligent data management methods for searching large collected Web pages and log analyzing methods for providing an efficient utilization of Web-ware houses.
1) A large scale system architecture for Web-ware houses : we proposed the PC cluster systems with many disks connected by SAN ( Storage Area Network), that is, SAN PC cluster. The prototype system is implemented and evaluated with large data mining query. The evaluation results show our proposed system is effective for storing and processing large scale data such as WWW contents.
2) Data management methods for intelligent processing a large volume of WWW pages : we employed a novel approach based on hyper link information amongst WWW pages. This approach is completely different from previous methods. By using the proposed approach, we extracted a community chart from the whole Japanese WWW pages which are collected and stored into our PC cluster system and made a WWW map. Then, we provided a visualization tool supporting to search and show a various relationship among extracted communities.
3) WWW access log analysis methods : we analyzed WWW access log to investigate user's behavior and to reflect analyzed results to systems such as e-commerce site, WWW store and so on. Then, we could extract typical user's behavior pattern from global WWW log ( panel logs ).

  • Research Products

    (12 results)

All 2005 2004 2003 2002

All Journal Article (12 results)

  • [Journal Article] 大域ウェブアクセスログを用いた関連語の発見法に関する一考察2005

    • Author(s)
      大塚真吾, 豊田正史, 喜連川優
    • Journal Title

      情報処理学会論文誌データベース(TOD) Vol. 46 No. SIG 8(TOD 26)

      Pages: 82-92

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] A Study for Related Words Finding Method Using Global Web Access Logs2005

    • Author(s)
      SHINGO OTSUKA, MASASHI TOYODA, MASARU KITSUREGAWA
    • Journal Title

      TOD Vol.46, No.SIG8(TOD26)

      Pages: 82-92

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Web Community Chart : a Tool for Navigating the Web and Observing its Evolution2004

    • Author(s)
      Masashi Toyoda, Masaru Kitsuregawa
    • Journal Title

      IEICE Transactions on Information and Systems E86-D, No. 6

      Pages: 1024-1031

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] ウェブコミュニティチャート : 膨大なウェブページを関連する話題を通して閲覧可能にするツール2004

    • Author(s)
      豊田正史, 吉田聡, 喜連川優
    • Journal Title

      電子情報通信学会論文誌 D-I Vol. J87-D-I, No.2

      Pages: 256-265

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Finding Web Communities by Maximum Flow Algorithm using Well-Assigned Edge Capacities2004

    • Author(s)
      Noriko Imafuji, Masaru Kitsuregawa
    • Journal Title

      Web活用のための情報処理技術賞特集号(英文論文誌D) VolE87-D No. 2

      Pages: 407-415

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Web Community Chart : a Tool for Navigating Numerous Web Pages By Related Topics2004

    • Author(s)
      Masashi TOYODA, Satoshi YOSHIDA, Masaru KITSUREGAWA
    • Journal Title

      THE IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS D-I VolJ87-D-I, No.2

      Pages: 256-265

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Finding Web Communities by Maximum Flow Algorithm using Well-Assigned Edge Capacities2004

    • Author(s)
      Noriko Imafuji, Masaru Kitsuregawa
    • Journal Title

      The IEICE Transactions on Information and Systems announces a forthcoming section on Information Processing Technology for Web Utilization

      Pages: 407-415

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Webコミュニティを用いた大域Webアクセスログ解析法の一提案2003

    • Author(s)
      大塚真吾, 豊田正史, 喜連川優
    • Journal Title

      情報処理学会論文誌 : データベース(IPSJ TODS) Vol.44, No.SIG13

      Pages: 32-44

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] A Study for Analysis of Web Access Logs with Web Communities2003

    • Author(s)
      SHINGO OTSUKA, MASASHI TOYODA, MASARU KITSUREGAWA
    • Journal Title

      IPSJ TOD Vol.44, No.SIG13(TOD20)

      Pages: 32-44

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Web Community Chart : a Tool for Navigating the Web and Observing its Evaluation2003

    • Author(s)
      Masashi Toyoda, Masaru Kitsuregawa
    • Journal Title

      IEICE Transactions on Information and Systems E86-D, NO.6

      Pages: 1024-1031

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Runtime Data Declustering over SAN-Connected PC Cluster System2002

    • Author(s)
      Masato Oguchi, Masaru Kitsuregawa
    • Journal Title

      Poster paper, IEEE International Conference on Data Enginee ring (ICDE 2002)

      Pages: 275

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Runtime Data Declustering over SAN-Connected PC Cluster System2002

    • Author(s)
      Masato oguchi, Masaru Kitsuregawa
    • Journal Title

      Proceedings of 18th IEEE Int'l Conference on Data Engineering (ICDE2002)

      Pages: 275

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2008-05-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi