Universal Data Compression by Digram
Project/Area Number |
09650404
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報通信工学
|
Research Institution | THE UNIVERSITY OF ELECTRO-COMMUNICATIONS |
Principal Investigator |
ITOH Shuichi Graduate School of Informatin Systems, Professor, 大学院・情報システム学研究科, 教授 (00017352)
|
Co-Investigator(Kenkyū-buntansha) |
HASHIMOTO Takeshi Faculty of Electro-Communications, Associate Professor, 電気通信学部, 助教授 (10142308)
|
Project Period (FY) |
1997 – 1998
|
Project Status |
Completed (Fiscal Year 1998)
|
Budget Amount *help |
¥1,100,000 (Direct Cost: ¥1,100,000)
Fiscal Year 1998: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 1997: ¥500,000 (Direct Cost: ¥500,000)
|
Keywords | lossless data compression / pattern matching / Lempel-Ziv / digram / algorithm / predictive coding / recursive algorithm / bigram |
Research Abstract |
This project was performed during the 1997-1998 fiscal years for developing and realizing a high-speed efficient noiseless data compression algorithm based on digram string matching, We have obtained the following results : 1. The update algorithm of the dictionary is similar to that of LZMW code. In LZMW code, all the strings in the dictionary are unique, while our code could store the same string multiple times. Thus, ours can not out-perform LZMW code in compression rate. However, since the design of the algorithm is recursive in nature, the implementation is far easier and the coding speed is far faster than those of LZMW. 2. The algorithm registers longer strings in the early stage of encoding. As the result, the performance of compression improves very quickly. Therefore, it is better suited for the compression of sequence at practical length. 3. We found that it is not enough to encode the position in the dictionary by the commonly used integer encoding. We need to assign the codeword length decided by the probability of the occurrence. Since the size of the alphabet increases according to the increase of input sequence length, we have the so called "the modeling of the source with big alphabet" problem. Therefore we developed the estimation algorithm of smooth probability distribution. Those results are expected to contribute as a basic technology for the future lossless compression schemes.
|
Report
(3 results)
Research Products
(16 results)