1998 Fiscal Year Final Research Report Summary
Studies on fast pattern matching algorithms based on text compressions
Project/Area Number |
09680343
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
計算機科学
|
Research Institution | KYUSHU UNIVERSITY |
Principal Investigator |
TAKEDA Masayuki Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVERSITY Associate Professor, 大学院・システム情報科学研究科, 助教授 (50216909)
|
Co-Investigator(Kenkyū-buntansha) |
SHINOHARA Ayumi Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVER, 大学院・システム情報科学研究科, 助教授 (00226151)
|
Project Period (FY) |
1997 – 1998
|
Keywords | pattern matching in compressed texts / speeding up pattern matching by text compression / multiple pattern matching / LZW compression / Huffman encoding / finite-state encoding / byte-pair encoding |
Research Abstract |
The aim of text compressions is to decrease the amount for storing files in secondary disk stor- ages. Therefore the traditional criterion is the compression ratio. In this project we propose a new criterion to select a compression method. The criterion is the efficiency of string pattern matching in compressed texts without decoding. The goals of this project are : Goal 1 : A faster search in compressed text in comparison with a decompression followed by a simple search. Goal 2 : A faster search in compressed text in comparison with a simple search in uncompressed text. Main results of this research in these two years are summarized as follows. (1) We developed and implemented a multiple pattern matching algorithm in compressed text by the LZW compression method, which is used in the COMPRESS command in UNIX. (2) We also devised a more efficient algorithm for a single pattern in LZW compressed texts, which is based on the Shift-And approach. (3) We proved by experiments that the algorithms of (1) and (2) are approximately twice faster than a decompression followed by a simple search. That is, we have achieved Goal 1. (4) We proved by experiments that the algorithms of (1) and (2) are faster than a simple search on uncompressed texts. That is, we have achieved Goal 2. (5) We also developed compressed pattern matching algorithms for other compression methods, such as, byte pair encoding, Huffman encoding, finite-state encoding, and compression using antidictionaries, and then evaluate them. We have finished this project successfully.
|