1998 Fiscal Year Final Research Report Summary

Studies on fast pattern matching algorithms based on text compressions

Research Project

Project/Area Number	09680343
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	計算機科学
Research Institution	KYUSHU UNIVERSITY
Principal Investigator	TAKEDA Masayuki Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVERSITY Associate Professor, 大学院・システム情報科学研究科, 助教授 (50216909)
Co-Investigator(Kenkyū-buntansha)	SHINOHARA Ayumi Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVER, 大学院・システム情報科学研究科, 助教授 (00226151)
Project Period (FY)	1997 – 1998
Keywords	pattern matching in compressed texts / speeding up pattern matching by text compression / multiple pattern matching / LZW compression / Huffman encoding / finite-state encoding / byte-pair encoding
Research Abstract	The aim of text compressions is to decrease the amount for storing files in secondary disk stor- ages. Therefore the traditional criterion is the compression ratio. In this project we propose a new criterion to select a compression method. The criterion is the efficiency of string pattern matching in compressed texts without decoding. The goals of this project are : Goal 1 : A faster search in compressed text in comparison with a decompression followed by a simple search. Goal 2 : A faster search in compressed text in comparison with a simple search in uncompressed text. Main results of this research in these two years are summarized as follows. (1) We developed and implemented a multiple pattern matching algorithm in compressed text by the LZW compression method, which is used in the COMPRESS command in UNIX. (2) We also devised a more efficient algorithm for a single pattern in LZW compressed texts, which is based on the Shift-And approach. (3) We proved by experiments that the algorithms of (1) and (2) are approximately twice faster than a decompression followed by a simple search. That is, we have achieved Goal 1. (4) We proved by experiments that the algorithms of (1) and (2) are faster than a simple search on uncompressed texts. That is, we have achieved Goal 2. (5) We also developed compressed pattern matching algorithms for other compression methods, such as, byte pair encoding, Huffman encoding, finite-state encoding, and compression using antidictionaries, and then evaluate them. We have finished this project successfully.

Research Products
(12 results)

All Other

All Publications (12 results)

[Publications] Takeda,M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142,Kyushu University. 1-12 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Kida,T.et al.: "Multiple Pattern Matching in LZW Compressed Text" Proc.Data Compression Conference,DCC'98. 103-112 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 宮崎正路ほか: "圧縮テキストに対するパターン照合機械の高速化" 情報処理学会論文誌. 39-9. 2638-2648 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Yamasaki,M.et al.: "Discovering characteristic patterns form collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Kida,T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156,Kyushu University. 1-13 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Shibata,Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157,Kyushu University. 1-12 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Takeda, M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142, Kyushu University. 1-12 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Kida, T.et al.: "Multiple pattern match-ing in LZW compressed text" Proc.Data Compression Conference (DCC98). 103-112 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Miyazaki, M.et al.: "Speeding up the pat-tern matching machine for compressed texts" Transaction of Information Process-ing Society of Japan. Vol.39, No.9. 2638-2648 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Yamasaki, M.et al.: "Discovering charac-teristic patterns from collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Kida, T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156, Kyushu University. 1-13 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Shibata, Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157, Kyushu University. 1-12 (1999)
- Description
  「研究成果報告書概要(欧文)」より

1998 Fiscal Year Final Research Report Summary

Studies on fast pattern matching algorithms based on text compressions

Principal Investigator

TAKEDA Masayuki Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVERSITY Associate Professor, 大学院・システム情報科学研究科, 助教授 (50216909)

Research Products

[Publications] Takeda,M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142,Kyushu University. 1-12 (1997)

Description

[Publications] Kida,T.et al.: "Multiple Pattern Matching in LZW Compressed Text" Proc.Data Compression Conference,DCC'98. 103-112 (1998)

Description

[Publications] 宮崎正路ほか: "圧縮テキストに対するパターン照合機械の高速化" 情報処理学会論文誌. 39-9. 2638-2648 (1998)

Description

[Publications] Yamasaki,M.et al.: "Discovering characteristic patterns form collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)

Description

[Publications] Kida,T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156,Kyushu University. 1-13 (1999)

Description

[Publications] Shibata,Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157,Kyushu University. 1-12 (1999)

Description

[Publications] Takeda, M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142, Kyushu University. 1-12 (1997)

Description

[Publications] Kida, T.et al.: "Multiple pattern match-ing in LZW compressed text" Proc.Data Compression Conference (DCC98). 103-112 (1998)

Description

[Publications] Miyazaki, M.et al.: "Speeding up the pat-tern matching machine for compressed texts" Transaction of Information Process-ing Society of Japan. Vol.39, No.9. 2638-2648 (1998)

Description

[Publications] Yamasaki, M.et al.: "Discovering charac-teristic patterns from collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)

Description

[Publications] Kida, T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156, Kyushu University. 1-13 (1999)

Description

[Publications] Shibata, Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157, Kyushu University. 1-12 (1999)

Description