• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2000 Fiscal Year Final Research Report Summary

Development of Intelligent Full-text Search System using Efficient Pattern Matching Algorithms on Compressed Data

Research Project

Project/Area Number 10558047
Research Category

Grant-in-Aid for Scientific Research (B).

Allocation TypeSingle-year Grants
Section展開研究
Research Field 計算機科学
Research InstitutionKYUSHU UNIVERSITY

Principal Investigator

SHINOHARA Ayumi  KYUSHU UNIVERSITY, Department of Informatics, Ass. Prof., 大学院・システム情報科学研究院, 助教授 (00226151)

Co-Investigator(Kenkyū-buntansha) SHIMOZONO Shinichi  KYUSHU UNIVERSITY, Department of Artificial Intelligence, Ass. Prof., 情報工学部, 助教授 (70243988)
SAKAMOTO Hiroshi  KYUSHU UNIVERSITY, Department of Informatics, Res. Ass., 大学院・システム情報科学研究院, 助手 (50315123)
TAKEDA Masayuki  KYUSHU UNIVERSITY, Department of Informatics, Ass. Prof., 大学院・システム情報科学研究院, 助教授 (50216909)
Project Period (FY) 1998 – 2000
KeywordsPattern matching / Data compression / Compressed pattern matching / Data mining / Machine learning / Subsequence automaton / Straight-line program
Research Abstract

From a theoretical point of view on compressed pattern matching, we introduced a unified frame work, called Collage System, for various dictionary-based data compression methods. We developed both Knuth-Morris-Pratt type and Boyer-Moore type pattern matching algorithms for Collage Systems. We adopted these algorithms for Byte-Pair-Encoding compression method, that yields the fastest compressed pattern matching algorithm in practice. Multiple pattern matching and approximate string matching were also successfully dealt with Collage Systems. We also applied the method for Sequitur, that is another hopeful a compression program, and verified its performance. Moreover, we studied an efficient fully compressed pattern matching for balanced straight-line programs, where not only text strings but also pattern strings are compressed. We also developed an online algorithm that constructs a subsequence automaton from given set of strings, that accepts all subsequences of any string in the set. The algorithm is the fastest, and we verified that it is quite useful to accelerate a knowledge discovery system. On the other hand, concerning with knowledge discovery from database, we studied on the learnability of transformation rules of trees from examples, and searching optimal association rules of words from large text databases. Journal of Discrete Algorithms, 1(1), 2000

  • Research Products

    (44 results)

All Other

All Publications (44 results)

  • [Publications] 宮崎正路 他: "圧縮テキストに対するパターン照合機械の高速化"情報処理学会論文誌. 39. 2638-2648 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] H.Sakamoto: "Finding a one-variable pattern from incomplete data"Proc. 9th International Conference on Algorithmic Learning Theory. LNAI1501. 234-246 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Y.Shibata et al.: "Pattern Matching in Text Compressed by Using Antidictionaries"Proc.10th Ann.Symp.Combinatorial Pattern Matching. LNCS1645. 37-49 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] T.Kida et al.: "Shift-And Approach to Pattern Matching in LZW Compressed Text"Proc.l0th Ann.Symp.Combinatorial Pattern Matching. LNCS1645. 1-13 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] T.Kida et al.: "A Unifying Framework for Compressed Pattern Matching"Proc.6th Int.Symp.String Processing and Information Retrieval. 89-96 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] S.Shimozono: "Alphabet indexing for approximating features of symbols"Theoretical Computer Science. 210. 245-260 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Y.Shibata et al.: "Speeding Up Pattern Matching by Text Compression"Proc.4th Italian Conf.on Algorithms and Complexity. LNCS1767. 306-316 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Y.Shibata and T.Matsumoto and M.Takeda and A.Shinohara and S.Arikawa: "A Boyer-Moore type algorithm for compressed pattern matching"Proc.11th Ann.Symp.on Combinatorial Pattern Matching. LNCS1848. 181-194 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] H.Sakamoto et al.: "Identification of tree translation rules from examples"Proc.5th International Colloquium on Grammatical Inference. LNAI1891. 240-255 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] S.Shimozono et al.: "On the hardness of approximating the minimum consistent acyclic DFS and decision diagram"Information Processing Letters. 66. 165-170 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] H.Arimura,S.Shimozono: "Maximizing agreement with a classification by bounded or unbounded number of words"Proc.6th Ann.Int.Symp.on Algorithms and Computation. LNAI1533. 39-48 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] M.Miyazaki,A.Shinohara and M.Takeda: "An Improved Pattern Matching Algorithm for Strings in terms of Straight-line Programs"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] T.Kida,M.Takeda,A.Shinohara,M.Miyazaki and S.Arikawa: "Multiple Pattern Matching in LZW Compressed Text"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Y.Shibata and M.Takeda and A.Shinohara and S.Arikawa: "Pattern Matching in Text Compressed by Using Antidictionaries"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] S.Shimozono,H.Arimura and S.Arikawa: "Efficient discovery of optimal word-association patterns in large text databases"New Generation Computing. 18. 49-60 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] H.Sakamoto,H.Arimura and S.Arikawa: "Identification of Tree Translation Rules from Examples"Proc.5th International Colloquium on Grammatical Inference. LNAI1891. 241-255 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] M.Hirao and A.Shinohara and M.Takeda and S.Arikawa: "Fully compressed pattern matching algorithm for balanced straight-line programs"Proc.7th International Symposium on String Processing and Information Retrieval (SPIRE2000). 132-138 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] H.Hoshino,A.Shinohara,M.Takeda and S.Arikawa: "Online construction of subsequence automata for multiple texts"Proc.of 7th International Symposium on String Processing and Information Retrieval. 146-152 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] T.Matsumoto,T.Kida,M.Takeda,A.Shinohara and S.Arikawa: "Bit-parallel approach to approximate string matching in compressed texts"Proc.of 7th International Symposium on String Processing and Information Retrieval. 221-228 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] G.Navarro,T.Kida,M.Takeda,A.Shinohara and S.Arikawa: "Faster Approximate String Matching over Compressed Text"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] S.Mitarai,M.Hirao,T.Matsumoto,A.Shinohara,M.Takeda and S.Arikawa: "Compressed Pattern Matching for Sequitur"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] M.Takeda et al.: "Speeding up string pattern matching by text compression : The dawn of a new era"情報処理学会論文誌. 42(3). (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] M.Miyazaki et al.: "Speeding up the pattern matching machine for compressed texts."Trans. Information Processing Society of Japan. 39(9). 2638-2648 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] H.Sakamoto: "Finding a one-variable pattern from incomplete data."Proc. 9th International Conference on Algorithmic Learning Theory, LNAI. 1501. 234-246 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Y.Shibata et al: "Pattern Matching in Text Compressed by Using Antidictionaries"Proc. 10th Ann. Symp. Combinatorial Pattern Matching. LNCS1645. 37-49 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] T.Kida et al: "Shift-And Approach to Pattern Matching in LZW Compressed Text"Proc. 10th Ann. Symp. Combinatorial Pattern Matching. LNCS1645. 1-13 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] T.Kida et al: "A Unifying Framework for Compressed Pattern Matching"Proc. 6th Int. Symp. String Processing, and Information Retrieval. 89-96 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] S.Shimozono: "Alphabet indexing for approximating features of symbols"Theoretical Computer Science. 210. 245-260 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Y.Shibata et al: "Speeding Up Pattern Matching by Text Compression"Proc. 4th Italian Conf. on Algorithms and Complexity. LNCS1767. 306-316 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Y.Shibata and T.Matsumoto and M.Takeda and A.Shinohara and S.Arikawa: "A Boyer-Moore type algorithm for compressed pattern matching"Proc. 11th Ann. Symp. on Combinatorial Pattern Matching. LNCS1848. 181-194 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] H.Sakamoto et al: "Identification of tree translation rules from examples"Proc. 5th International Colloquium on Grammatical Inference. LNAI1891. 240-255 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] S.Shimozono et al: "On the hardness of approximating the minimum consistent acyclic DFS and decision diagram"Information Processing Letters. 66. 165-170 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] H.Arimura, S.Shimozono: "Maximizing agreement with a classification by bounded or unbounded number of words"Proc. 6th Ann. Int. Symp. on Algorithms and Computation. LNAI1533. 39-48 (1988)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] M.Miyazaki, A.Shinohara and M.Takeda: "An Improved Pattern Matching Algorithm for Strings in terms of Straight-line Programs"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] T.Kida, M.Takeda, A.Shinohara, M.Miyazaki and S.Arikawa: "Multiple Pattern Matching in LZW Compressed Text"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Y.Shibata and M.Takeda and A.Shinohara and S.Arikawa: "Pattern Matching in Text Compressed by Using Antidictionaries"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] S.Shimozono, H.Arimura and S.Arikawa: "Efficient discovery of optimal word-association patterns in large text databases"New Generation Computing. 18. 49-60 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] H.Sakamoto, H.Arimura and S.Arikawa: "Identification of Tree Translation Rules from Examples"Proc. 5th International Colloquium on Grammatical Inference. LNAI1891. 241-255 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] M.Hiraa and A.Shinohara and M.Takeda and S.Arikawa: "Fully compressed pattern matching algorithm for balanced straight-line programs"Proc. 7th International Symposium on String Processing and Information Retrieval. 132-138 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] H.Hoshino, A.Shinohara, M.Takeda and S.Arikawa: "Online construction of subsequence automata for multiple texts"Proc. 7th International Symposium on String Processing and Information Retrieval. 146-152 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] T.Matsumoto, T.Kida, M.Takeda, A.Shinohara and S.Arikawa: "Bit-Parallel Approach to approximate string matching in compressed texts"Proc. 7th International Symposium on String Processing and Information Retrieval. 222-228 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] G.Navarro, T.Kida, M.Takeda, A.Shinohara and S.Arikawa: "{Faster Approximate String Matching over Compressed Text"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] S.Mitarai, M.Hirao, T.Matsumoto, A.Shinohara, M.Takeda and S.Arikawa: "Compressed Pattern Matching for Sequitur"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] M.Takeda et al: "Speeding up string pattern matching by text compression : The dawn of a new era"Trans. Information Processing Society of Japan. 42(3). (2001)

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2002-03-26  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi