Constructing the Web-ware house for Web Mining

Research Project

Project/Area Number	13224014
Research Category	Grant-in-Aid for Scientific Research on Priority Areas
Allocation Type	Single-year Grants
Review Section	Science and Engineering
Research Institution	University of Tokyo
Principal Investigator	KITSUREGAWA Masaru University of Tokyo, Institute of Industrial Science, Professor, 生産技術研究所, 教授 (40161509)
Co-Investigator(Kenkyū-buntansha)	OGUCHI Masato Ochanomizu Women's University, Department of Science, Associate Professor, 理学部, 助教授 (60328036) NAKANO Miyuki University of Tokyo, Institute of Industrial Science, Associate Researcher, 生産技術研究所, 助手 (30227863)
Project Period (FY)	2001 – 2005
Project Status	Completed (Fiscal Year 2005)
Budget Amount *help	¥122,100,000 (Direct Cost: ¥122,100,000) Fiscal Year 2005: ¥28,000,000 (Direct Cost: ¥28,000,000) Fiscal Year 2004: ¥28,000,000 (Direct Cost: ¥28,000,000) Fiscal Year 2003: ¥32,000,000 (Direct Cost: ¥32,000,000) Fiscal Year 2002: ¥34,100,000 (Direct Cost: ¥34,100,000)
Keywords	Computer systems / Internet Performance / Data Storage / SAN connected PC Cluster / Contents Archives / Data Mining / Web Contents / Web Link Analysis / ディスククラスタ / コンテンツ・アーカイブ / ウェブマイニング / ウェブウェアハウス / WWW
Research Abstract	WWW contents are very important resources in Japan from the point of country resources view, so these contents are expected to be useful and efficient for our key industries. However, the WWW contents are only utilized for keyword search on commercial search engines at this time. Our research goal is constructing high performance platform which provides the feasible access to Web contents in order to study novel search methods on WWW. So, we researched a novel system architecture which is appropriate for large data intensive processing, intelligent data management methods for searching large collected Web pages and log analyzing methods for providing an efficient utilization of Web-ware houses. 1) A large scale system architecture for Web-ware houses : we proposed the PC cluster systems with many disks connected by SAN ( Storage Area Network), that is, SAN PC cluster. The prototype system is implemented and evaluated with large data mining query. The evaluation results show our proposed system is effective for storing and processing large scale data such as WWW contents. 2) Data management methods for intelligent processing a large volume of WWW pages : we employed a novel approach based on hyper link information amongst WWW pages. This approach is completely different from previous methods. By using the proposed approach, we extracted a community chart from the whole Japanese WWW pages which are collected and stored into our PC cluster system and made a WWW map. Then, we provided a visualization tool supporting to search and show a various relationship among extracted communities. 3) WWW access log analysis methods : we analyzed WWW access log to investigate user's behavior and to reflect analyzed results to systems such as e-commerce site, WWW store and so on. Then, we could extract typical user's behavior pattern from global WWW log ( panel logs ).

Report

(6 results)

2005 Annual Research Report Final Research Report Summary
2004 Annual Research Report
2003 Annual Research Report
2002 Annual Research Report
2001 Annual Research Report

Research Products
(42 results)

All 2006 2005 2004 2003 2002 Other

All Journal Article (24 results) Publications (18 results)

[Journal Article] 大規模アクセスログを用いた検索支援システム2006
- Author(s)
  大塚真吾, 喜連川優
- Journal Title
  
  電子情報通信学会第17回データ工学ワークショップ,1B-02
- Related Report
  2005 Annual Research Report
[Journal Article] 大域ウェブアクセスログを用いた関連語の発見法に関する一考察2005
- Author(s)
  大塚真吾, 豊田正史, 喜連川優
- Journal Title
  
  情報処理学会論文誌データベース(TOD) Vol. 46 No. SIG 8(TOD 26)
  
  Pages: 82-92
- NAID
  110002768781
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] A Study for Related Words Finding Method Using Global Web Access Logs2005
- Author(s)
  SHINGO OTSUKA, MASASHI TOYODA, MASARU KITSUREGAWA
- Journal Title
  
  TOD Vol.46, No.SIG8(TOD26)
  
  Pages: 82-92
- NAID
  110002768781
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] 大域ウェブアクセスログを用いた関連語の発見に関する一考察2005
- Author(s)
  大塚真吾, 豊田正史, 喜連川優
- Journal Title
  
  日本データベース学会Letters Vol.3, No.2
  
  Pages: 1-4
- NAID
  40007013085
- Related Report
  2005 Annual Research Report
[Journal Article] 大域ウェブアクセスログを用いた関連語の発見法に関する一考察2005
- Author(s)
  大塚真吾, 豊田正史, 喜連川優
- Journal Title
  
  情報処理学会論文誌データベース(TOD) Vol.46, No.SIG8(TOD26)
  
  Pages: 82-92
- NAID
  110002768781
- Related Report
  2005 Annual Research Report
[Journal Article] 大域ウェブアクセスログを用いた検索語クラスタリング2005
- Author(s)
  大塚真吾, 喜連川優
- Journal Title
  
  情報処理学会研究報告 Vol.2005 No.67 2005-DBS-137(I)
  
  Pages: 191-198
- NAID
  110002952350
- Related Report
  2005 Annual Research Report
[Journal Article] トレースシステムを用いたIP-SANにおけるファイル操作性能に関する解析2005
- Author(s)
  山口実靖, 小口正人, 喜連川優
- Journal Title
  
  FIT 2005(第4回情報科学技術フォーラム)一般講演論文集第2分冊
  
  Pages: 85-86
- Related Report
  2005 Annual Research Report
[Journal Article] iSCSIネットワークストレージにおけるファイルアクセス性能に関する考察2005
- Author(s)
  山口実靖, 小口正人, 喜連川優
- Journal Title
  
  情報処理学会研究報告 2005-DBS-137(II)
  
  Pages: 569-574
- NAID
  110002952400
- Related Report
  2005 Annual Research Report
[Journal Article] ウェブコミュニティ出現におけるリンク構造成長パターン分析2005
- Author(s)
  今藤紀子, 喜連川優
- Journal Title
  
  電子情報通信学会第16回データ工学ワークショップ(DEWS2005) 5C-o1
- NAID
  40007013161
- Related Report
  2004 Annual Research Report
[Journal Article] Web Community Chart : a Tool for Navigating the Web and Observing its Evolution2004
- Author(s)
  Masashi Toyoda, Masaru Kitsuregawa
- Journal Title
  
  IEICE Transactions on Information and Systems E86-D, No. 6
  
  Pages: 1024-1031
- NAID
  110003213757
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] ウェブコミュニティチャート : 膨大なウェブページを関連する話題を通して閲覧可能にするツール2004
- Author(s)
  豊田正史, 吉田聡, 喜連川優
- Journal Title
  
  電子情報通信学会論文誌 D-I Vol. J87-D-I, No.2
  
  Pages: 256-265
- NAID
  110003171304
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Finding Web Communities by Maximum Flow Algorithm using Well-Assigned Edge Capacities2004
- Author(s)
  Noriko Imafuji, Masaru Kitsuregawa
- Journal Title
  
  Web活用のための情報処理技術賞特集号(英文論文誌D) VolE87-D No. 2
  
  Pages: 407-415
- NAID
  110003223363
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Web Community Chart : a Tool for Navigating Numerous Web Pages By Related Topics2004
- Author(s)
  Masashi TOYODA, Satoshi YOSHIDA, Masaru KITSUREGAWA
- Journal Title
  
  THE IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS D-I VolJ87-D-I, No.2
  
  Pages: 256-265
- NAID
  110003171304
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Finding Web Communities by Maximum Flow Algorithm using Well-Assigned Edge Capacities2004
- Author(s)
  Noriko Imafuji, Masaru Kitsuregawa
- Journal Title
  
  The IEICE Transactions on Information and Systems announces a forthcoming section on Information Processing Technology for Web Utilization
  
  Pages: 407-415
- NAID
  110003223363
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] FP-tax : Tree Structure Based Generalized Association Rule Mining2004
- Author(s)
  Iko Pramudiono, Masaru Kitsuregawa
- Journal Title
  
  The 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD04)
- Related Report
  2004 Annual Research Report
[Journal Article] 大域ウェブアクセスログを用いたユーザ行動の分析2004
- Author(s)
  大塚真吾, 豊田正史, 喜連川優
- Journal Title
  
  夏のデータベースワークショップDBWS2004,情報処理学会研究報告 2004-DBS-134(I)
  
  Pages: 17-24
- NAID
  110003174720
- Related Report
  2004 Annual Research Report
[Journal Article] Yellow Page driven Methods of Collecting and Scoring Spatial Web Documents2004
- Author(s)
  Takeshi Sagara, Masaru Kitsuregawa
- Journal Title
  
  Workshop on Geographic Information Retrieval SIGIR 2004
  
  Pages: 4-8
- Related Report
  2004 Annual Research Report
[Journal Article] Extracting User Behavior by Web Communities Technology on Global Web Logs2004
- Author(s)
  Shingo Otsuka, Masashi Toyoda, Jun Hirai, Masaru Kitsuregawa
- Journal Title
  
  Proc.of 15th International Conference on Database and Expert Systems Applications (DEXA'2004)
  
  Pages: 957-968
- Related Report
  2004 Annual Research Report
[Journal Article] 大域ウェブアクセスログを用いた関連語の発見に関する一考察2004
- Author(s)
  大塚真吾, 豊田正史, 喜連川優
- Journal Title
  
  日本データベース学会(DBSJ) Letters Vol.3, No.2
  
  Pages: 1-4
- NAID
  40007013085
- Related Report
  2004 Annual Research Report
[Journal Article] Webコミュニティを用いた大域Webアクセスログ解析法の一提案2003
- Author(s)
  大塚真吾, 豊田正史, 喜連川優
- Journal Title
  
  情報処理学会論文誌 : データベース(IPSJ TODS) Vol.44, No.SIG13
  
  Pages: 32-44
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] A Study for Analysis of Web Access Logs with Web Communities2003
- Author(s)
  SHINGO OTSUKA, MASASHI TOYODA, MASARU KITSUREGAWA
- Journal Title
  
  IPSJ TOD Vol.44, No.SIG13(TOD20)
  
  Pages: 32-44
- NAID
  110002712014
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Web Community Chart : a Tool for Navigating the Web and Observing its Evaluation2003
- Author(s)
  Masashi Toyoda, Masaru Kitsuregawa
- Journal Title
  
  IEICE Transactions on Information and Systems E86-D, NO.6
  
  Pages: 1024-1031
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Runtime Data Declustering over SAN-Connected PC Cluster System2002
- Author(s)
  Masato Oguchi, Masaru Kitsuregawa
- Journal Title
  
  Poster paper, IEEE International Conference on Data Enginee ring (ICDE 2002)
  
  Pages: 275-275
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2005 Final Research Report Summary
[Journal Article] Runtime Data Declustering over SAN-Connected PC Cluster System2002
- Author(s)
  Masato oguchi, Masaru Kitsuregawa
- Journal Title
  
  Proceedings of 18th IEEE Int'l Conference on Data Engineering (ICDE2002)
  
  Pages: 275-275
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2005 Final Research Report Summary
[Publications] Noriko Imafuji, Masaro Kitsuregawa: "Finding Web Communities by Maximum Flow Algorithm using Well-Assigned Edge Capacity"電子情報通信学会英文論文誌D. Vol.87-D, No.2. 407-415 (2004)
- Related Report
  2003 Annual Research Report
[Publications] 大塚真吾, 豊田正史, 喜連川優: "Webコミュニティを用いた大域Webアクセスログ解析法の一提案"情報処理学会論文誌:データベース(IPSJ TOD). Vol.44,No.SIG13(TOD20). 32-44 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Noriko Imafuji, Masaru Kitsuregawa: "Finding a Web Community by Maximum Flow Algorithm with HITS Score Based Capacity"Proceedings of 8^<th> International Conference on Database Systems for Advanced Applications(DASFAA2003). 101-106 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Iko Pamudiono, Masaru Kitsuregawa: "Tree Structure based Parallel Grequent Pattern Mining on PC Cluster"Proceedings of 14^<th> International Conference on Database and Expert Systems Applications(DEXA2003). 537-544 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Masaru Kitsuregawa, Iko Pramudiono: "PC Cluster Based Parallel Frequent Pattern Mining and Parallel Web Access Pattern Miming"Proceedings of Third International Workshop on Databases in Networked Information Systems(DNIS2003). 172-17 (2003)
- Related Report
  2003 Annual Research Report
[Publications] イコプラムディオノ, 喜連川優: "Fp-growthの無共有並列実行:Shared Nothing Parallel Execution of FP-growth"日本データベース学会Letters (DBSJ Letters). Vol.2,No.1. 43-46 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Bowo Prasetyo, Masaru Kitsuregawa, et al.: "Naviz : Website Navigational Behavior Visualizer"Proc. of 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining(PAKDD 2002). (2002)
- Related Report
  2002 Annual Research Report
[Publications] Yusuke Ohura, Masaru Kitsuregawa, et al.: "Experiments on Query Expansion for Internet Yellow Page Services Using Web Log Mining"28th International Conference on Very Large Data Bases(VLDB 2002). (2002)
- Related Report
  2002 Annual Research Report
[Publications] Wang Y., Kitsuregawa M.: "On Combining Link and Contents Information for Web Page Clustering"Proc. of DEXA2002. (2002)
- Related Report
  2002 Annual Research Report
[Publications] Masato Oguchi, Masaru Kitsuregawa: "Runtime Data Declustering based on Bandwidth-on-Deamand and its Evaluation over SAN-connected PC Cluster"Proc. of 15th International Conference on Parallel and Distributed Computing Systems(PDCS 2002). 206-213 (2002)
- Related Report
  2002 Annual Research Report
[Publications] Noriko Imafuji, Masaru Kitsuregawa: "Effects of Maximum Flow Algorithm on Idetifying Web Community"Proc. of 4th International Workshop on Web Information and Data Management(WIDM 2002). 43-48 (2002)
- Related Report
  2002 Annual Research Report
[Publications] Masashi Toyoda, Masaru Kitsuregawa: "Observing Evolution of Web Community"Proceedings of 11th International WWW Conference(poster). (2002)
- Related Report
  2002 Annual Research Report
[Publications] Masato Oguchi, Masaru Kitsuregawa: "Data Mining on PC Cluster connected with Storage Area Network : Its Preliminary Experimental Results"IEEE International Conference on Communications (JCC2001), G51b.1. (2001)
- Related Report
  2001 Annual Research Report
[Publications] Yitong Wang, Masaru Kitsuregawa: "Link Based Clustering of Web Search Results"Advances in Web-Age Information Management Second International Conference, (WAIM2001), Springer (Lecture Notes in Computer Science). 2118. 225-236 (2001)
- Related Report
  2001 Annual Research Report
[Publications] Masato Oguchi, Masaru Kitsuregawa: "Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments"International Conference on Advances in Infrastructure for Electronic Business, Science, and Education on the Internet (SS-GRR2001). 116. (2001)
- Related Report
  2001 Annual Research Report
[Publications] Masashi Toyoda, Masaru Kitsuregawa: "Creating a Web Community Chart for Navigating Related Communities"Conference Proceedings of Hypertext 2001. 103-112 (2001)
- Related Report
  2001 Annual Research Report
[Publications] P.Krishna Reddy, Masaru Kitsuregawa: "An approach to relate the web communities through bipartite graphs"Proceedings of The 2nd International Conference on Web Information Systems Engineering, IEEE Computer Society. (2001)
- Related Report
  2001 Annual Research Report
[Publications] Yitong Wang, Masaru Kitsuregawa: "Use link-based Clustering to Improve Search Resutls"Proceedings of the 2nd International Conference on Web Information Systems Engineering, IEEE Computer Society. (2001)
- Related Report
  2001 Annual Research Report

Constructing the Web-ware house for Web Mining

Principal Investigator

KITSUREGAWA Masaru University of Tokyo, Institute of Industrial Science, Professor, 生産技術研究所, 教授 (40161509)

¥122,100,000 (Direct Cost: ¥122,100,000)

Report

Research Products

[Journal Article] 大規模アクセスログを用いた検索支援システム2006

Author(s)

Journal Title

Related Report

[Journal Article] 大域ウェブアクセスログを用いた関連語の発見法に関する一考察2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] A Study for Related Words Finding Method Using Global Web Access Logs2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] 大域ウェブアクセスログを用いた関連語の発見に関する一考察2005

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 大域ウェブアクセスログを用いた関連語の発見法に関する一考察2005

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 大域ウェブアクセスログを用いた検索語クラスタリング2005

Author(s)

Journal Title

NAID

Related Report

[Journal Article] トレースシステムを用いたIP-SANにおけるファイル操作性能に関する解析2005

Author(s)

Journal Title

Related Report

[Journal Article] iSCSIネットワークストレージにおけるファイルアクセス性能に関する考察2005

Author(s)

Journal Title

NAID

Related Report

[Journal Article] ウェブコミュニティ出現におけるリンク構造成長パターン分析2005

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Web Community Chart : a Tool for Navigating the Web and Observing its Evolution2004

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] ウェブコミュニティチャート : 膨大なウェブページを関連する話題を通して閲覧可能にするツール2004

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Finding Web Communities by Maximum Flow Algorithm using Well-Assigned Edge Capacities2004

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Web Community Chart : a Tool for Navigating Numerous Web Pages By Related Topics2004

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Finding Web Communities by Maximum Flow Algorithm using Well-Assigned Edge Capacities2004

Author(s)

Journal Title

NAID

Description

[Publications] イコプラムディオノ, 喜連川優: "Fp-growthの無共有並列実行:Shared Nothing Parallel Execution of FP-growth"日本データベース学会Letters (DBSJ Letters). Vol.2,No.1. 43-46 (2003)