Studies on fast pattern matching algorithms based on text compressions

Research Project

Project/Area Number	09680343
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	計算機科学
Research Institution	KYUSHU UNIVERSITY
Principal Investigator	TAKEDA Masayuki Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVERSITY Associate Professor, 大学院・システム情報科学研究科, 助教授 (50216909)
Co-Investigator(Kenkyū-buntansha)	SHINOHARA Ayumi Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVER, 大学院・システム情報科学研究科, 助教授 (00226151)
Project Period (FY)	1997 – 1998
Project Status	Completed (Fiscal Year 1998)
Budget Amount *help	¥700,000 (Direct Cost: ¥700,000) Fiscal Year 1998: ¥700,000 (Direct Cost: ¥700,000)
Keywords	pattern matching in compressed texts / speeding up pattern matching by text compression / multiple pattern matching / LZW compression / Huffman encoding / finite-state encoding / byte-pair encoding / パターン照合 / テキスト圧縮 / テキストデータベース / 情報検索 / データ圧縮
Research Abstract	The aim of text compressions is to decrease the amount for storing files in secondary disk stor- ages. Therefore the traditional criterion is the compression ratio. In this project we propose a new criterion to select a compression method. The criterion is the efficiency of string pattern matching in compressed texts without decoding. The goals of this project are : Goal 1 : A faster search in compressed text in comparison with a decompression followed by a simple search. Goal 2 : A faster search in compressed text in comparison with a simple search in uncompressed text. Main results of this research in these two years are summarized as follows. (1) We developed and implemented a multiple pattern matching algorithm in compressed text by the LZW compression method, which is used in the COMPRESS command in UNIX. (2) We also devised a more efficient algorithm for a single pattern in LZW compressed texts, which is based on the Shift-And approach. (3) We proved by experiments that the algorithms of (1) and (2) are approximately twice faster than a decompression followed by a simple search. That is, we have achieved Goal 1. (4) We proved by experiments that the algorithms of (1) and (2) are faster than a simple search on uncompressed texts. That is, we have achieved Goal 2. (5) We also developed compressed pattern matching algorithms for other compression methods, such as, byte pair encoding, Huffman encoding, finite-state encoding, and compression using antidictionaries, and then evaluate them. We have finished this project successfully.

Report

(3 results)

1998 Annual Research Report Final Research Report Summary
1997 Annual Research Report

Research Products
(18 results)

All Other

All Publications (18 results)

[Publications] Takeda,M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142,Kyushu University. 1-12 (1997)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Kida,T.et al.: "Multiple Pattern Matching in LZW Compressed Text" Proc.Data Compression Conference,DCC'98. 103-112 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] 宮崎正路ほか: "圧縮テキストに対するパターン照合機械の高速化" 情報処理学会論文誌. 39-9. 2638-2648 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Yamasaki,M.et al.: "Discovering characteristic patterns form collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Kida,T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156,Kyushu University. 1-13 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Shibata,Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157,Kyushu University. 1-12 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Takeda, M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142, Kyushu University. 1-12 (1997)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Kida, T.et al.: "Multiple pattern match-ing in LZW compressed text" Proc.Data Compression Conference (DCC98). 103-112 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Miyazaki, M.et al.: "Speeding up the pat-tern matching machine for compressed texts" Transaction of Information Process-ing Society of Japan. Vol.39, No.9. 2638-2648 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Yamasaki, M.et al.: "Discovering charac-teristic patterns from collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Kida, T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156, Kyushu University. 1-13 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Shibata, Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157, Kyushu University. 1-12 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] 宮崎正路: "圧縮テキストに対するパターン照合機械の高速化" 情報処理学会論文誌. 39・9. 2638-2648 (1998)
- Related Report
  1998 Annual Research Report
[Publications] T. Kida et al: "Shift-And approach to pattern mutching in LZW texts" Technicul Report, Department of Informatics, Kyushu Univ.156. 1-12 (1999)
- Related Report
  1998 Annual Research Report
[Publications] M. Yamasaki et al: "Discovering Characteristic Patterns from Collections of Classical Japanese Poems" Lecture Notes in Artificial Interlligence. 1532. 129-140 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Y. Shibata et al: "Pattern matching in texts Compressed by using Antidictionaries" Technicul Report, Department of Informatics, Kyushu Univ.157. 1-12 (1999)
- Related Report
  1998 Annual Research Report
[Publications] T.Kida et al.: "Multiple Pattern Matching in LZW Compressed Text" Proc.Data Compression Conference,DCC'98. (to appear). (1998)
- Related Report
  1997 Annual Research Report
[Publications] 山崎真由美ほか: "MDL原理を用いた和歌データからのパターン抽出" 情報処理学会研究報告. 37-5. 29-34 (1998)
- Related Report
  1997 Annual Research Report

Studies on fast pattern matching algorithms based on text compressions

Principal Investigator

TAKEDA Masayuki Graduate School of Information Science and Electrical Engineering, KYUSHU UNIVERSITY Associate Professor, 大学院・システム情報科学研究科, 助教授 (50216909)

¥700,000 (Direct Cost: ¥700,000)

Report

Research Products

[Publications] Takeda,M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142,Kyushu University. 1-12 (1997)

Description

Related Report

[Publications] Kida,T.et al.: "Multiple Pattern Matching in LZW Compressed Text" Proc.Data Compression Conference,DCC'98. 103-112 (1998)

Description

Related Report

[Publications] 宮崎正路ほか: "圧縮テキストに対するパターン照合機械の高速化" 情報処理学会論文誌. 39-9. 2638-2648 (1998)

Description

Related Report

[Publications] Yamasaki,M.et al.: "Discovering characteristic patterns form collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)

Description

Related Report

[Publications] Kida,T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156,Kyushu University. 1-13 (1999)

Description

Related Report

[Publications] Shibata,Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157,Kyushu University. 1-12 (1999)

Description

Related Report

[Publications] Takeda, M.: "Pattern matching machine for text compressed using finite state model" Technical Report DOI-TR-142, Kyushu University. 1-12 (1997)

Description

Related Report

[Publications] Kida, T.et al.: "Multiple pattern match-ing in LZW compressed text" Proc.Data Compression Conference (DCC98). 103-112 (1998)

Description

Related Report

[Publications] Miyazaki, M.et al.: "Speeding up the pat-tern matching machine for compressed texts" Transaction of Information Process-ing Society of Japan. Vol.39, No.9. 2638-2648 (1998)

Description

Related Report

[Publications] Yamasaki, M.et al.: "Discovering charac-teristic patterns from collections of classical Japanese poems" Proc.1st International Conference on Discovery Science. 129-140 (1998)

Description

Related Report

[Publications] Kida, T.et al.: "Shift-And approach to pattern matching in LZW compressed text" Technical Report DOI-TR-156, Kyushu University. 1-13 (1999)

Description

Related Report

[Publications] Shibata, Y.et al.: "Pattern matching in text compressed by using antidictionaries" Technical Report DOI-TR-157, Kyushu University. 1-12 (1999)

Description

Related Report

[Publications] 宮崎正路: "圧縮テキストに対するパターン照合機械の高速化" 情報処理学会論文誌. 39・9. 2638-2648 (1998)

Related Report

[Publications] T. Kida et al: "Shift-And approach to pattern mutching in LZW texts" Technicul Report, Department of Informatics, Kyushu Univ.156. 1-12 (1999)

Related Report

[Publications] M. Yamasaki et al: "Discovering Characteristic Patterns from Collections of Classical Japanese Poems" Lecture Notes in Artificial Interlligence. 1532. 129-140 (1998)

Related Report

[Publications] Y. Shibata et al: "Pattern matching in texts Compressed by using Antidictionaries" Technicul Report, Department of Informatics, Kyushu Univ.157. 1-12 (1999)

Related Report

[Publications] T.Kida et al.: "Multiple Pattern Matching in LZW Compressed Text" Proc.Data Compression Conference,DCC'98. (to appear). (1998)

Related Report

[Publications] 山崎真由美ほか: "MDL原理を用いた和歌データからのパターン抽出" 情報処理学会研究報告. 37-5. 29-34 (1998)

Related Report