• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2004 Fiscal Year Final Research Report Summary

Distributed Data Mining Systems for Structured Web Data

Research Project

Project/Area Number 14580423
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionKYUSHU UNIVERSITY

Principal Investigator

SHOUDAI Takayoshi  Kyushu University, Department of Informatics, Associate Professor, 大学院・システム情報科学研究院, 助教授 (50226304)

Co-Investigator(Kenkyū-buntansha) MARUYAMA Osamu  Kyushu University, Faculty of Mathematics, Associate Professor, 大学院・数理学研究院, 助教授 (20282519)
MIYAHARA Tetsuhiro  Hiroshima City University, Faculty of Information Sciences, Associate Professor, 情報科学部, 助教授 (90209932)
UCHIDA Tomoyuki  Hiroshima City University, Faculty of Information Sciences, Associate Professor, 情報科学部, 助教授 (70264934)
Project Period (FY) 2002 – 2004
Keywordsdata mining / machine learning / inductive inference / tree structured data / web mining / metasearch / network algorithm
Research Abstract

In this research, we studied knowledge discovery from semistructured Web documents such as HTML/XML files. Graph or tree-based data mining and discovery of frequent structures in graph or tree structured data have been extensively studied. Our target of discovery is neither a simply frequent pattern nor a maximally frequent pattern with respect to syntactic sizes of patterns such as the number of vertices. In order to extract useful information from heterogeneous semistructured Web documents, our target of discovery is a semantically and maximally tree structured pattern which represents a common characteristic in semistructured documents. As a representation of a tree structured pattern, we proposed an ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and internal structured variables.
A term tree is different from other representations of tree structured patterns in that a term tree has structured variables which can be substituted by arbitrary trees. First of all, we deeply studied the learnabilities of classes of term tree languages and gave fundamental classes of term tree languages which are polynomial time learnable. We proved that some classes of term tree languages are polynomial time inductively inferable from positive data, which include the class of linear term tree languages with multiple child-port variables, the class of linear term tree languages with contractible variables which are adjacent to leaves, and the class of linear term tree languages with height-constrained variables and no variable chain. Moreover, we showed that some classes of linear term tree languages are exactly learnable in polynomial time using queries.
Finally, we presented a metasearch system which uses our efficient learning algorithms for term trees. We implemented this system and showed that it provides an effective unified access to multiple existing search sites.

  • Research Products

    (34 results)

All 2004 2003 2002

All Journal Article (34 results)

  • [Journal Article] Discovery of Maximally Frequent Tag Tree Patterns with Contractible Variables from Semistructured Documents2004

    • Author(s)
      Tetsuhiro Miyahara
    • Journal Title

      Proc.8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer-Verlag, LNAI 3056

      Pages: 133-144

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Polynomial Time Inductive Inference of Ordered Tree Languages with Height-Constrained Variables from Positive Data2004

    • Author(s)
      Yusuke Suzuki
    • Journal Title

      Proc.8th Pacific Rim International Conference on Artificial Intelligence, Springer-Verlag, LNAI 3157

      Pages: 211-220

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Learning of Ordered Tree Languages with Height-Bounded Variables Using Queries2004

    • Author(s)
      Satoshi Matsumoto
    • Journal Title

      Proc.15th Workshop on Algorithmic Learning Theory, Springer-Verlag, LNAI 3244

      Pages: 425-439

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Automatic Wrapper Generation for Metasearch using Ordered Tree Structured Patterns2004

    • Author(s)
      Kazuhide Aikou
    • Journal Title

      Proc.17th Australian Joint Conference on Artificial Intelligence, Springer-Verlag, LNAI 3339

      Pages: 1030-1035

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Discovery of Maximally Frequent Tag Tree Patterns with Contractible Variables from Semistruc-tured Documents.2004

    • Author(s)
      T.Miyahara, Y.Suzuki, T.Shoudai, T.Uchida, K.Takahashi, H.Ueda
    • Journal Title

      Proc.8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.3056

      Pages: 133-144

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Polynomial Time Inductive Inference of Ordered Tree Languages with Height-Constrained Variables from Positive Data.2004

    • Author(s)
      Y.Suzuki, T.Shoudai, T.Miyahara, S.Matsumoto
    • Journal Title

      Proc.8th Pacific Rim International Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.3157

      Pages: 211-220

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Learning of Ordered Tree Languages with Height-Bounded Variables Using Queries.2004

    • Author(s)
      S.Matsumoto, T.Shoudai
    • Journal Title

      Proc.15th Workshop on Algorithmic Learning Theory, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.3244

      Pages: 425-439

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Automatic Wrapper Generation for Metasearch using Ordered Tree Structured Patterns.2004

    • Author(s)
      K.Aikou, Y.Suzuki, T.Shoudai, T.Miyahara
    • Journal Title

      Proc.17th Australian Joint Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.3339

      Pages: 1030-1035

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data2003

    • Author(s)
      Yusuke Suzuki
    • Journal Title

      Proc.12th International Conference on Inductive Logic Programming, Springer-Verlag, LNAI 2583

      Pages: 270-284

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Extraction of tag tree patterns with contractible variables from irregular semistructured data2003

    • Author(s)
      Tetsuhiro Miyahara
    • Journal Title

      Proc.7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer-Verlag, LNAI 2637

      Pages: 430-436

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Finding frequent subgraphs from graph structured data with geometric information and its application to lossless compression2003

    • Author(s)
      Yuko Itokawa
    • Journal Title

      Proc.7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer-Verlag, LNAI 2637

      Pages: 582-594

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Efficient learning of unlabeled term trees with contractible variables from positive data2003

    • Author(s)
      Yusuke Suzuki
    • Journal Title

      Proc.13th International Conference on Inductive Logic Programming, Springer-Verlag, LNAI 2835

      Pages: 347-364

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] An effective grammar-based compression algorithm for tree structured data2003

    • Author(s)
      Kazunori Yamagata
    • Journal Title

      Proc.13th International Conference on Inductive Logic Programming, Springer-Verlag, LNAI 2835

      Pages: 383-400

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Efficient learning of ordered and unordered tree patterns with contractible variables2003

    • Author(s)
      Yusuke Suzuki
    • Journal Title

      Proc.14th Workshop on Algorithmic Learning Theory, Springer-Verlag, LNAI 2842

      Pages: 114-128

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Learning of finite unions of tree patterns with repeated internal structured variables from queries2003

    • Author(s)
      Satoshi Matsumoto
    • Journal Title

      Proc.14th Workshop on Algorithmic Learning Theory, Springer-Verlag, LNAI 2842

      Pages: 144-158

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data.2003

    • Author(s)
      Y.Suzuki, K.Inomae, T.Shoudai, T.Miyahara, T.Uchida
    • Journal Title

      Proc.12th International Conference on Inductive Logic Programming, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2583

      Pages: 270-284

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Extraction of tag tree patterns with contractible variables from irregular semistructured data.2003

    • Author(s)
      T.Miyahara, Y.Suzuki, T.Shoudai, T.Uchida, S.Hirokawa, K.Takahashi, H.Ueda
    • Journal Title

      Proc.7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2637

      Pages: 430-436

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Finding frequent sub-graphs from graph structured data with geometric information and its application to loss less compression.2003

    • Author(s)
      Y.Itokawa, T Uchida, T.Shoudai, T.Miyahara, Y.Nakamura
    • Journal Title

      Proc.7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2637

      Pages: 582-594

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Efficient learning of unlabeled term trees with contractible variables from positive data.2003

    • Author(s)
      Y.Suzuki, T.Shoudai, S.Matsumoto, T.Uchida
    • Journal Title

      Proc.13th International Conference on Inductive Logic Programming, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2835

      Pages: 347-364

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] An effective grammar-based compression algorithm for tree structured data.2003

    • Author(s)
      K.Yamagata, T.Uchida, T.Shoudai, Y.Nakamura
    • Journal Title

      Proc.13th International Conference on Inductive Logic Programming, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2835

      Pages: 383-400

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Efficient learning of ordered and unordered tree patterns with contractible variables.2003

    • Author(s)
      Y.Suzuki, T.Shoudai, S.Matsumoto, T.Uchida, T.Miyahara
    • Journal Title

      Proc.14th Workshop on Algorithmic Learning Theory, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2842

      Pages: 114-128

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Learning of finite unions of tree patterns with repeated internal structured variables from queries.2003

    • Author(s)
      S.Matsumoto, Y.Suzuki, T.Shoudai, T.Miyahara, T.Uchida
    • Journal Title

      Proc.14th Workshop on Algorithmic Learning Theory, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2842

      Pages: 144-158

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Discovery of frequent tag tree patterns in semistructured web documents2002

    • Author(s)
      Tetsuhiro Miyahara
    • Journal Title

      Proc.6th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD-2002), Springer-Verlag, LNAI 2336

      Pages: 341-355

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Extracting characteristic structures among words in semistructured documents2002

    • Author(s)
      Kazuyoshi Furukawa
    • Journal Title

      Proc.6th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD-2002), Springer-Verlag, LNAI 2336

      Pages: 356-367

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Polynomial time inductive inference of ordered tree patterns with internal structured variables from positive data2002

    • Author(s)
      Yusuke Suzuki
    • Journal Title

      Proc.15th Annual Conference on Computational Learning Theory, Springer-Verlag, LNAI 2375

      Pages: 169-184

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Ordered term tree languages which are polynomial time inductively inferable from positive data2002

    • Author(s)
      Yusuke Suzuki
    • Journal Title

      Proc.13th Workshop on Algorithmic Learning Theory, Springer-Verlag, LNAI 2533

      Pages: 188-203

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Toward drawing an atlas of hypothesis classes2002

    • Author(s)
      Osamu Maruyama
    • Journal Title

      Proc.5th International Conference on Discovery Science, Springer-Verlag, LNCS 2534

      Pages: 220-232

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Learning of finite unions of tree patterns with internal structured variables from queries2002

    • Author(s)
      Satoshi Matsumoto
    • Journal Title

      Proc.15th Australian Joint Conference on Artificial Intelligence, Springer-Verlag, LNAI 2557

      Pages: 523-534

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Discovery of frequent tag tree patterns in semistructured web documents.2002

    • Author(s)
      T.Miyahara, Y.Suzuki, T.Shoudai, T.Uchida, K.Takahashi, H.Ueda
    • Journal Title

      Proc.6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2336

      Pages: 341-355

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Extract-ing characteristic structures among words in semistructured documents.2002

    • Author(s)
      K.Furukawa, T.Uchida, K.Yamada, T.Miyahara, T.Shoudai, Y.Nakamura
    • Journal Title

      Proc.6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2336

      Pages: 356-367

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Polynomial time inductive inference of ordered tree patterns with internal structured variables from positive data.2002

    • Author(s)
      Y.Suzuki, R.Akanuma, T.Shoudai, T.Miyahara, T.Uchida
    • Journal Title

      Proc.15th Annual Conference on Computational Learning Theory, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2375

      Pages: 169-184

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Ordered term tree languages which are polynomial time inductively inferable from positive data.2002

    • Author(s)
      Y.Suzuki, T.Shoudai, T.Uchida, T.Miyahara
    • Journal Title

      Proceedings of the 13th Workshop on Algorithmic Learning Theory, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2533

      Pages: 188-203

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Toward drawing an atlas of hypothesis classes.2002

    • Author(s)
      O.Maruyama, T.Shoudai, S.Miyano
    • Journal Title

      Proc.5th International Conference on Discovery Science, Lecture Notes in Computer Science(Springer-Verlag) Vol.2534

      Pages: 220-232

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Learning of finite unions of tree patterns with internal structured variables from queries.2002

    • Author(s)
      S.Matsumoto, T.Shoudai, T.Miyahara, T.Uchida
    • Journal Title

      Proc.15th Australian Joint Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence(Springer-Verlag) Vol.2557

      Pages: 523-534

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2006-07-11  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi