• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Study of High-speed Data Mining Algorithms from Massive Data Streams

Research Project

Project/Area Number 15300036
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Media informatics/Database
Research InstitutionKYUSHU UNIVERSITY (2003, 2005)
Hokkaido University (2004)

Principal Investigator

IKEDA Daisuke (2005)  Kyusyu Univ., Library, Asso.Prof., 附属図書館, 助教授 (00294992)

有村 博紀 (2003-2004)  北海道大学, 大学院・情報科学研究科, 教授 (20222763)

Co-Investigator(Kenkyū-buntansha) TAKEDA Masayuki  Kyusyu Univ., Grad.School of Info.Sci. and Elec.Eng., Prof., 大学院・システム情報科学研究院, 教授 (50216909)
SHINOHARA Ayumi  Tohoku Univ., Grad.School of Info.Sci., Prof., 大学院・情報科学研究科, 教授 (00226151)
KIDA Takuya  Hokkaido Univ., Grad.School of Info.Sci., Asso.Prof., 大学院・情報科学研究科, 助教授 (70343316)
KASAHARA Yoshiaki  Kyusyu Univ., Computing and Communication Center, Res.Assoc., 情報基盤センター, 助手 (60284577)
ISHINO Akira  Kyusyu Univ., Office for Information of Univ.Evaluation, Res.Assoc., 大学評価情報室, 助手 (10315129)
Project Period (FY) 2003 – 2005
Project Status Completed (Fiscal Year 2005)
Budget Amount *help
¥15,600,000 (Direct Cost: ¥15,600,000)
Fiscal Year 2005: ¥2,700,000 (Direct Cost: ¥2,700,000)
Fiscal Year 2004: ¥5,700,000 (Direct Cost: ¥5,700,000)
Fiscal Year 2003: ¥7,200,000 (Direct Cost: ¥7,200,000)
Keywordsdata stream / data mining / XML data / semi-structured data / pattern matching / sequence discovery / Xpath / tree mining / 半構造データ技術 / 高速データストリーム / 情報抽出 / 知識獲得 / 大規模ネットワークデータ / オンライン半構造データ検索
Research Abstract

In this research, we investigated high-speed online knowledge discovery system for extracting useful information from massive semi-structured data streams. Particularly in this year, as theoretical researches, we extended further the theory of efficient pattern matching and pattern discovery methods for online streams. As application studies, we made a series of experiments on collection and analysis of network data from real high-speed networks in a huge organization. We have also published the results obtained in the research period of the last three years. In particular, we proceed the studies on the following issues:
(1)Survey on semi-structured data : We have summarized and published a survey on stream data mining in an academic journal, which has been studied through this project for the last three years.
(2)Study on streaming pattern matching technology for semi-structured data : We developed an efficient method for performing tree pattern matching with horizontal wildcards by bit parallel technology, which potentially gives drastic speed-up for Xpath and XQuery pattern matching languages for huge XML data.
(3)Study on sequential and streaming pattern discovery technology for semi-structured data : We developed efficient algorithms for finding interesting patterns from massive data streams for various classes of complex patterns/motifs. In this year, we also published pattern discovery algorithms developed in the last year. Also, one of them got awarded for 2004 JSAI SIG AWARD.
(4)Empirical study on knowledge discovery from real massive network data : As applications, we performed a series of surveys on data collection and online analysis of high-speed large-scale network for middle sized organization at Kyushu University. These experiments will give insights for future research on the development of efficient pattern matching/discovery algorithms for high-speed streaming data.

Report

(4 results)
  • 2005 Annual Research Report   Final Research Report Summary
  • 2004 Annual Research Report
  • 2003 Annual Research Report
  • Research Products

    (40 results)

All 2006 2005 2004 Other

All Journal Article (28 results) Publications (12 results)

  • [Journal Article] 単純な部分文字列照合によるWebからの書誌情報の抽出2006

    • Author(s)
      松本 英樹, 田中 省作, 池田 大輔, 平木 啓太
    • Journal Title

      第30回ディジタル図書館ワークショップ (口頭発表)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Faster Pattern Matching Algorithm for Arc-Annotated Sequences2006

    • Author(s)
      Takuya Kida
    • Journal Title

      Lecture Notes in Computer Science To appear

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Passive Server Detection and Banner Collection2006

    • Author(s)
      Y.Kasahara
    • Journal Title

      CoreUniversity Seminar on Next Generation Internet

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams2006

    • Author(s)
      T.Asai, K.Abe, S.kawasoe, H.Arimura, S.Arikawa
    • Journal Title

      Lecture Notes in Computer Science To appear

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Faster Pattern Matching Algorithm for Arc-Annotated Sequences2006

    • Author(s)
      Takuya Kida
    • Journal Title

      Proc.-Federation on the Web, LNAI, Springer-Verlag (to appear)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Passive Server Detection and Banner Collection.2006

    • Author(s)
      Y.Kasahara
    • Journal Title

      JSPS Core University Seminar on Next Generation Internet

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams2006

    • Author(s)
      T.Asai, K.Abe, S.Kawasoe, H.Arimura, S.Arikawa
    • Journal Title

      Report from the 2004 Annual Meeting of japan Society for Artifical Intelligence (JSAI2004), JSAI, LNAI, Katsumi Nitta et al.(eds.), Springer-Verlag (in printng)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Faster Pattern Matching Algorithm for Arc-Annotated Sequences2006

    • Author(s)
      Takuya Kida
    • Journal Title

      Lecture Notes in Computer Science (To appear)

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Passive Server Detection and Banner Collection2006

    • Author(s)
      Y.Kasahara
    • Journal Title

      2006 JSPS Core-University Seminar on Next Generation Internet

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Efficient Algorithms for Finding Frequent Substructures from Semi-structured Data Streams2006

    • Author(s)
      T.Asai, K.Abe, S.Kawasoe, H.Arimura, S.Arikawa
    • Journal Title

      Lecture Notes in Computer Science (To appear)

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 部分文字列増幅法による共通パタン発見アルゴリズム2005

    • Author(s)
      池田 大輔, 山田 泰寛, 廣川 左千男
    • Journal Title

      情報処理学会論文誌 数理モデル化と応用 46・SIG 2(TOM 11)

      Pages: 56-66

    • NAID

      110002914186

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] An Approach to Analyzing Correlation between Songs/Artists Using iTMS Playlists2005

    • Author(s)
      Y.Dou, E.Itoh, S.Hirokawa, D.Ikeda
    • Journal Title

      Proc. International Conference on Intelligent Agents,Web Technology and Internet Commerce IAWTIC'2005

      Pages: 28-30

    • NAID

      120006654585

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] A Bit-parallel Tree Matching Algorithm for Patterns with Horizontal VLDC's2005

    • Author(s)
      Hisashi Tsuji, Akira Ishino, Masayuki Takeda
    • Journal Title

      Lecture Notes in Computer Science 3772

      Pages: 388-398

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Annual Research Report 2005 Final Research Report Summary
  • [Journal Article] 大規模データストリームのためのマイニング技術の動向2005

    • Author(s)
      有村博紀
    • Journal Title

      電子情報通信学会論文誌D-1 J88-D-1・3

      Pages: 563-575

    • NAID

      110003207353

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Annual Research Report 2005 Final Research Report Summary
  • [Journal Article] Special Issue on Algorithmic Learning Thoery2005

    • Author(s)
      Sanjay Jain, Hiroki Arimura
    • Journal Title

      Theoretical Computer Science 348・1-348・2

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] A Polynomial Space and Polynomial Delay Algorithm for Enumeration of Maximal Motifs in a Sequence2005

    • Author(s)
      Hiroki Arimura, Takeaki Uno
    • Journal Title

      Lecture Notes in Computer Science 3827

      Pages: 724-737

    • NAID

      110003225066

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] An Approach to Analyzing Correlation between Songs/Artists Using iTMS Playlists2005

    • Author(s)
      Y.Dou, E.Itoh, S.Hirokawa, K.Ikeda
    • Journal Title

      Proc.International Conference on Intelligent Agents, Web Technology and Internet Commerce (IAWTIC'2005)

      Pages: 28-30

    • NAID

      120006654585

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] A Bit-parallel Tree Matching Algorithm for Patterns with Horizontal VLDC's2005

    • Author(s)
      Hisashi Tsuji, Akira Ishino, Masayuki Takeda
    • Journal Title

      Proc.12th International Symposium on String Processing and Information Retrieval (SPIRE 2005), Lecture Notes in Computer Science 3772, Springer

      Pages: 388-398

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Recent Development of Stream Data Mining Algorithms (In Japanese)2005

    • Author(s)
      H.Arimura
    • Journal Title

      IEICE Transactions on Information and Systems Vol.J89-D, No.2

      Pages: 172-183

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Special Issue on Algorithmic Learning Thoery2005

    • Author(s)
      Sanjay Jain, Hiroki Arimura
    • Journal Title

      Theoretical computer Science 348(1-2)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] A Polynomial Space and Polynomial Delay Algorithm for Enumeration of maximal Motifs in a Sequence2005

    • Author(s)
      Hiroki Arimura, Takeaki Uno
    • Journal Title

      Proc.the 16th Annual International Symposium on Algorithms and Computation (ISAAC'05), LNCS 3827, Springer

    • NAID

      110003225066

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] 部分文字列増幅法による共通パタン発見アルゴリズム2005

    • Author(s)
      池田 大輔, 山田 泰寛, 廣川 佐千男
    • Journal Title

      情報処理学会論文誌:数理モデル化と応用 46・SIG 2(TOM 11)

      Pages: 56-66

    • NAID

      110002914186

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 大規模データストリームのためのマイニング技術の動向2005

    • Author(s)
      有村博紀
    • Journal Title

      電子情報通信学会論文誌 J88-D-I・2(印刷中)

    • NAID

      110003207353

    • Related Report
      2004 Annual Research Report
  • [Journal Article] データストリームのためのマイニング技術2005

    • Author(s)
      有村博紀, 喜田拓也
    • Journal Title

      情報処理,情報処理学会 46・1

      Pages: 4-11

    • NAID

      110002768327

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Efficient Substructure Discovery from Large Semi-structured Data2004

    • Author(s)
      H.Arimura, H.Sakamoto他4名
    • Journal Title

      IEICE Transactions on Information and Systems E87-D・12

      Pages: 2754-2763

    • NAID

      110003213885

    • Related Report
      2004 Annual Research Report
  • [Journal Article] An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases2004

    • Author(s)
      H.Arimura他3名
    • Journal Title

      Proc.the 7th International Conference on Discovery Science (DS'04) LNAI3245

      Pages: 16-30

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Finding Optimal Pairs of Cooperative and Competing Patterns with Bounded Distance2004

    • Author(s)
      A.Shinohara, M.Takeda他5名
    • Journal Title

      Proc.The 7th International Conference on Discovery Science (DS 2004) LNAI3245

      Pages: 32-46

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Pattern matching with taxonomic information2004

    • Author(s)
      T.Kida, H.Arimura
    • Journal Title

      Proc.Asia Information Retrieval Symposium (AIRS'04)

    • NAID

      120000959147

    • Related Report
      2004 Annual Research Report
  • [Publications] Hiroshi Sakamoto et al.: "Learning Elementary Formal Systems with Queries"Theoretical Computer Science. 298(1). 21-50 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Tatsuya Asai et al.: "Discovering Frequent Substructures in Large Unordered Trees"Proc.the 6th International Conference on Discovery Science (DS'03). 2843. 47-61 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Takeaki Uno et al.: "LCM : An Efficient Algorithm for Enumerating Frequent Closed Item Sets"Proc.ICDM'03 Workshop on Frequent Itemset Mining Implementations (FIMI'03). (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] 有村 博紀: "計算学習理論における学習"人工知能学会誌. 18・5. 531-536 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Tatsuya Asai et al.: "半構造データマイニングにおけるパターン発見技法"電子情報通信学会論文誌. J87-D-1・2. 111-139 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Hiroki Arimura: "Efficient Text and Semi-structured Data Mining : Knowledge Discovery in the Cyberspace"The first Franco-Japanese Workshop on Information Search, Integration and Personalization (ISIP'03). (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Masayuki Takeda et al.: "Discovering Most Classificatory Patterns for Very Expressive Pattern Classes"Lecture Notes in Computer Science. 2843. 486-493 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Shunsuke Inenaga et al.: "Compact Directed Acyclic Word Graphs for a Sliding Window"Journal of Discrete Algorithms. (to appear). (2004)

    • Related Report
      2003 Annual Research Report
  • [Publications] Shunsuke Inenaga et al.: "Linear-time off-line text compression by longest-first substitution"Lecture Notes in Computer Science. 8572. 137-152 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Hideo Bannai et al.: "Inferring Strings from Graphs and Arrays"Lecture Notes in Computer Science. 2747. 208-217 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Kensuke Baba et al.: "On the length of the minimum solution of word equations in one variable"Lecture Notes in Computer Science. 2747. 189-197 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Satoru Miyamoto et al.: "Ternary Directed Acyclic Word Graphs"Lecture Notes in Computer Science. 2759. 120-130 (2003)

    • Related Report
      2003 Annual Research Report

URL: 

Published: 2003-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi