• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of Intelligent Full-text Search System using Efficient Pattern Matching Algorithms on Compressed Data

Research Project

Project/Area Number 10558047
Research Category

Grant-in-Aid for Scientific Research (B).

Allocation TypeSingle-year Grants
Section展開研究
Research Field 計算機科学
Research InstitutionKYUSHU UNIVERSITY

Principal Investigator

SHINOHARA Ayumi  KYUSHU UNIVERSITY, Department of Informatics, Ass. Prof., 大学院・システム情報科学研究院, 助教授 (00226151)

Co-Investigator(Kenkyū-buntansha) SHIMOZONO Shinichi  KYUSHU UNIVERSITY, Department of Artificial Intelligence, Ass. Prof., 情報工学部, 助教授 (70243988)
SAKAMOTO Hiroshi  KYUSHU UNIVERSITY, Department of Informatics, Res. Ass., 大学院・システム情報科学研究院, 助手 (50315123)
TAKEDA Masayuki  KYUSHU UNIVERSITY, Department of Informatics, Ass. Prof., 大学院・システム情報科学研究院, 助教授 (50216909)
ZEUGMANN Thomas (ZEUGMANN Tho)  九州大学, 大学院・システム情報科学研究科, 教授 (60264016)
Project Period (FY) 1998 – 2000
Project Status Completed (Fiscal Year 2000)
Budget Amount *help
¥10,400,000 (Direct Cost: ¥10,400,000)
Fiscal Year 2000: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 1999: ¥3,900,000 (Direct Cost: ¥3,900,000)
Fiscal Year 1998: ¥4,500,000 (Direct Cost: ¥4,500,000)
KeywordsPattern matching / Data compression / Compressed pattern matching / Data mining / Machine learning / Subsequence automaton / Straight-line program / 文字列照合 / パタンマッチング / 全文検索 / 計算学習理論 / 圧縮パタン照合 / BPE圧縮 / Lempel-Ziv圧縮
Research Abstract

From a theoretical point of view on compressed pattern matching, we introduced a unified frame work, called Collage System, for various dictionary-based data compression methods. We developed both Knuth-Morris-Pratt type and Boyer-Moore type pattern matching algorithms for Collage Systems. We adopted these algorithms for Byte-Pair-Encoding compression method, that yields the fastest compressed pattern matching algorithm in practice. Multiple pattern matching and approximate string matching were also successfully dealt with Collage Systems. We also applied the method for Sequitur, that is another hopeful a compression program, and verified its performance. Moreover, we studied an efficient fully compressed pattern matching for balanced straight-line programs, where not only text strings but also pattern strings are compressed. We also developed an online algorithm that constructs a subsequence automaton from given set of strings, that accepts all subsequences of any string in the set. The algorithm is the fastest, and we verified that it is quite useful to accelerate a knowledge discovery system. On the other hand, concerning with knowledge discovery from database, we studied on the learnability of transformation rules of trees from examples, and searching optimal association rules of words from large text databases. Journal of Discrete Algorithms, 1(1), 2000

Report

(4 results)
  • 2000 Annual Research Report   Final Research Report Summary
  • 1999 Annual Research Report
  • 1998 Annual Research Report
  • Research Products

    (67 results)

All Other

All Publications (67 results)

  • [Publications] 宮崎正路 他: "圧縮テキストに対するパターン照合機械の高速化"情報処理学会論文誌. 39. 2638-2648 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Sakamoto: "Finding a one-variable pattern from incomplete data"Proc. 9th International Conference on Algorithmic Learning Theory. LNAI1501. 234-246 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata et al.: "Pattern Matching in Text Compressed by Using Antidictionaries"Proc.10th Ann.Symp.Combinatorial Pattern Matching. LNCS1645. 37-49 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Kida et al.: "Shift-And Approach to Pattern Matching in LZW Compressed Text"Proc.l0th Ann.Symp.Combinatorial Pattern Matching. LNCS1645. 1-13 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Kida et al.: "A Unifying Framework for Compressed Pattern Matching"Proc.6th Int.Symp.String Processing and Information Retrieval. 89-96 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Shimozono: "Alphabet indexing for approximating features of symbols"Theoretical Computer Science. 210. 245-260 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata et al.: "Speeding Up Pattern Matching by Text Compression"Proc.4th Italian Conf.on Algorithms and Complexity. LNCS1767. 306-316 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata and T.Matsumoto and M.Takeda and A.Shinohara and S.Arikawa: "A Boyer-Moore type algorithm for compressed pattern matching"Proc.11th Ann.Symp.on Combinatorial Pattern Matching. LNCS1848. 181-194 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Sakamoto et al.: "Identification of tree translation rules from examples"Proc.5th International Colloquium on Grammatical Inference. LNAI1891. 240-255 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Shimozono et al.: "On the hardness of approximating the minimum consistent acyclic DFS and decision diagram"Information Processing Letters. 66. 165-170 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Arimura,S.Shimozono: "Maximizing agreement with a classification by bounded or unbounded number of words"Proc.6th Ann.Int.Symp.on Algorithms and Computation. LNAI1533. 39-48 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Miyazaki,A.Shinohara and M.Takeda: "An Improved Pattern Matching Algorithm for Strings in terms of Straight-line Programs"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Kida,M.Takeda,A.Shinohara,M.Miyazaki and S.Arikawa: "Multiple Pattern Matching in LZW Compressed Text"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata and M.Takeda and A.Shinohara and S.Arikawa: "Pattern Matching in Text Compressed by Using Antidictionaries"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Shimozono,H.Arimura and S.Arikawa: "Efficient discovery of optimal word-association patterns in large text databases"New Generation Computing. 18. 49-60 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Sakamoto,H.Arimura and S.Arikawa: "Identification of Tree Translation Rules from Examples"Proc.5th International Colloquium on Grammatical Inference. LNAI1891. 241-255 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Hirao and A.Shinohara and M.Takeda and S.Arikawa: "Fully compressed pattern matching algorithm for balanced straight-line programs"Proc.7th International Symposium on String Processing and Information Retrieval (SPIRE2000). 132-138 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Hoshino,A.Shinohara,M.Takeda and S.Arikawa: "Online construction of subsequence automata for multiple texts"Proc.of 7th International Symposium on String Processing and Information Retrieval. 146-152 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Matsumoto,T.Kida,M.Takeda,A.Shinohara and S.Arikawa: "Bit-parallel approach to approximate string matching in compressed texts"Proc.of 7th International Symposium on String Processing and Information Retrieval. 221-228 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] G.Navarro,T.Kida,M.Takeda,A.Shinohara and S.Arikawa: "Faster Approximate String Matching over Compressed Text"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Mitarai,M.Hirao,T.Matsumoto,A.Shinohara,M.Takeda and S.Arikawa: "Compressed Pattern Matching for Sequitur"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Takeda et al.: "Speeding up string pattern matching by text compression : The dawn of a new era"情報処理学会論文誌. 42(3). (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Miyazaki et al.: "Speeding up the pattern matching machine for compressed texts."Trans. Information Processing Society of Japan. 39(9). 2638-2648 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Sakamoto: "Finding a one-variable pattern from incomplete data."Proc. 9th International Conference on Algorithmic Learning Theory, LNAI. 1501. 234-246 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata et al: "Pattern Matching in Text Compressed by Using Antidictionaries"Proc. 10th Ann. Symp. Combinatorial Pattern Matching. LNCS1645. 37-49 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Kida et al: "Shift-And Approach to Pattern Matching in LZW Compressed Text"Proc. 10th Ann. Symp. Combinatorial Pattern Matching. LNCS1645. 1-13 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Kida et al: "A Unifying Framework for Compressed Pattern Matching"Proc. 6th Int. Symp. String Processing, and Information Retrieval. 89-96 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Shimozono: "Alphabet indexing for approximating features of symbols"Theoretical Computer Science. 210. 245-260 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata et al: "Speeding Up Pattern Matching by Text Compression"Proc. 4th Italian Conf. on Algorithms and Complexity. LNCS1767. 306-316 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata and T.Matsumoto and M.Takeda and A.Shinohara and S.Arikawa: "A Boyer-Moore type algorithm for compressed pattern matching"Proc. 11th Ann. Symp. on Combinatorial Pattern Matching. LNCS1848. 181-194 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Sakamoto et al: "Identification of tree translation rules from examples"Proc. 5th International Colloquium on Grammatical Inference. LNAI1891. 240-255 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Shimozono et al: "On the hardness of approximating the minimum consistent acyclic DFS and decision diagram"Information Processing Letters. 66. 165-170 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Arimura, S.Shimozono: "Maximizing agreement with a classification by bounded or unbounded number of words"Proc. 6th Ann. Int. Symp. on Algorithms and Computation. LNAI1533. 39-48 (1988)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Miyazaki, A.Shinohara and M.Takeda: "An Improved Pattern Matching Algorithm for Strings in terms of Straight-line Programs"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Kida, M.Takeda, A.Shinohara, M.Miyazaki and S.Arikawa: "Multiple Pattern Matching in LZW Compressed Text"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] Y.Shibata and M.Takeda and A.Shinohara and S.Arikawa: "Pattern Matching in Text Compressed by Using Antidictionaries"Journal of Discrete Algorithms. 1(1). (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Shimozono, H.Arimura and S.Arikawa: "Efficient discovery of optimal word-association patterns in large text databases"New Generation Computing. 18. 49-60 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Sakamoto, H.Arimura and S.Arikawa: "Identification of Tree Translation Rules from Examples"Proc. 5th International Colloquium on Grammatical Inference. LNAI1891. 241-255 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Hiraa and A.Shinohara and M.Takeda and S.Arikawa: "Fully compressed pattern matching algorithm for balanced straight-line programs"Proc. 7th International Symposium on String Processing and Information Retrieval. 132-138 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] H.Hoshino, A.Shinohara, M.Takeda and S.Arikawa: "Online construction of subsequence automata for multiple texts"Proc. 7th International Symposium on String Processing and Information Retrieval. 146-152 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Matsumoto, T.Kida, M.Takeda, A.Shinohara and S.Arikawa: "Bit-Parallel Approach to approximate string matching in compressed texts"Proc. 7th International Symposium on String Processing and Information Retrieval. 222-228 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] G.Navarro, T.Kida, M.Takeda, A.Shinohara and S.Arikawa: "{Faster Approximate String Matching over Compressed Text"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] S.Mitarai, M.Hirao, T.Matsumoto, A.Shinohara, M.Takeda and S.Arikawa: "Compressed Pattern Matching for Sequitur"Data Compression Conference 2001. (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Takeda et al: "Speeding up string pattern matching by text compression : The dawn of a new era"Trans. Information Processing Society of Japan. 42(3). (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] M.Miyazaki,A.Shinohara and M.Takeda: "An Improved Pattern Matching Algorithm for Strings in terms of Straight-line Programs"Journal of Discrete Algorithms. 1(1). (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] T.Kida,M.Takeda,A.Shinohara,M.Miyazaki and S.Arikawa: "Multiple Pattern Matching in LZW Compressed Text"Journal of Discrete Algorithms. 1(1). (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Y.Shibata and M.Takeda and A.Shinohara and S.Arikawa: "Pattern Matching in Text Compressed by Using Antidictionaries"Journal of Discrete Algorithms. 1(1). (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] S.Shimozono,H.Arimura and S.Arikawa: "Efficient discovery of optimal word-association patterns in large text databases"New Generation Computing . 18. 49-60 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Y.Shibata and T.Matsumoto and M.Takeda and A.Shinohara and S.Arikawa : "A Boyer-Moore type algorithm for compressed pattern matching "Proc.11th Ann.Symp.on Combinatorial Pattern Matching. LNCS 1848. 181-194 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] H.Sakamoto,H.Arimura and S.Arikawa: "Identification of Tree Translation Rules from Examples"Proc.5th International Colloquium on Grammatical Inference. LNAI 1891. 241-255 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] M.Hirao and A.Shinohara and M.Takeda and S.Arikawa: "Fully compressed pattern matching algorithm for balanced straight-line programs"Proc.7th International Symposium on String Processing and Information Retrieval (SPIRE2000). 132-138 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] H.Hoshino,A.Shinohara,M.Takeda and S.Arikawa: "Online construction of subsequence automata for multiple texts"Proc.of 7th International Symposium on String Processing and Information Retrieval. 146-152 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] T.Matsumoto,T.Kida,M.Takeda,A.Shinohara and S.Arikawa: "Bit-parallel approach to approximate string matching in compressed texts"Proc.of 7th International Symposium on String Processing and Information Retrieval. 221-228 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] G.Navarro,T.Kida,M.Takeda,A.Shinohara and S.Arikawa: "Faster Approximate String Matching over Compressed Text"Data Compression Conference 2001. (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] S.Mitarai,M.Hirao,T.Matsumoto,A.Shinohara,M.Takeda and S.Arikawa: "Compressed Pattern Matching for Sequitur"Data Compression Conference 2001. (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] M.Takeda,Y.Shibata,T.Matsumoto,T.Kida,A.Shinohara,S.Fukamachi,T.Shinohara and S.Arikawa: "Speeding up string pattern matching by text compression : The dawn of a new era"情報処理学会論文誌. 42(3). (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] Y.Shibata et al.: "Pattern Matching in Text Compressed by Using Antidictionaries."Proc.10th Ann.Symp.on Combinatorial Pattern Matching.. LNCS1645. 37-49 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] T.Kida et al.: "shift-And Approach to Pettern Matching in LZW Compressed Text"Proc.10th Ann.Symp.on Combinatorial Pattern Matching.. LNCS1645. 1-13 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] T.Kida et al.: "A Unifying Framework for Compressed Pattern Matching."Proc. 6th Int. Symp. on String Processing and Information Re-trieval. 89-96 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] K.Tamari et al.: "Discovering Poetic Allusion in Anthiologies of Classical Japanese Poems."Proc. 2nd Int. Conf. on Discovery Science. LNAI1721. 128-138 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] Y.Shibata et al.: "Speeding Up Pattern Matching by Text Compression"Proc. 4th Italian Conf.on Algorithms and Complexity. LNCS1767. 306-316 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] 宮崎 正路 他: "圧縮テキストに対するパターン照合機械の高速化" 情報処理学会論文誌. 39. 2638-2648 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] T. Kida et al.: "Multiple pattern matching in LZV compressed text" Data Compression Conference 1998. 103-113 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] M. Yamasaki et al.: "Discovering characteritic patterns from collections of classical Japanese Poems" Prof. 1st Int. Conf. on Discovery Science. LNAI1532. 129-140 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] Y. Hayashi et al.: "Uniform characterizations of polynomial-query learnabilities" Prof. 1st Int. Conf. on Discovery Science. LNAI1532. 84-92 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] S. Shimozono et al.: "On the hardness of approximating the minimum consistent acyclic DFS and decision diagram" Information Processing Letters. 66. 165-170 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] H. Arimura, S. Shimozono: "Maximizing agreement with a classification by bounded or unbounded number of words" Proc. 6th Ann Int. Symp. on Algorithms and Computation. LNAI1533. 39-48 (1998)

    • Related Report
      1998 Annual Research Report

URL: 

Published: 1998-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi