2020 Fiscal Year Annual Research Report
文字列圧縮と組合せ論による大規模データ管理・処理技法の開発
Project/Area Number |
18F18120
|
Research Institution | Tokyo Medical and Dental University |
Co-Investigator(Kenkyū-buntansha) |
Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)
|
Project Period (FY) |
2018-10-12 – 2021-03-31
|
Keywords | data structures / algorithms / lossless compression / hashing / アルゴリズム / データ構造 / 文字列データ処理 / tries |
Outline of Annual Research Achievements |
The focus of this research was set on (a) practical and dynamic trie data structures, (b) the computation of the grammar compression Re-Pair in small space, and (c) advancements for the bijective Burrows-Wheeler transform (BBWT), a variant of the Burrows-Wheeler transform (BWT) well received in theory as well as in practice for indexing string data. (a) We have devised a novel approach for compact hashing, which is the most memory-efficient approach in practice when working with a huge number of integer keys of a bounded domain. Based on this approach, we have proposed dynamic trie data structures working with path-decomposition or with trie compaction. (b) Re-Pair, a grammar with high compression ratios, is difficult to compute within limited amount of memory. Here, we could find a quadratic time algorithm computing Re-Pair with almost no additional space. We also devised an index data structure build upon a grammar representing the Lyndon tree. This index exploits several properties of the Lyndon words to improve the running time of the currently fastest grammar index from a quadratic factor on the pattern length to a linear one. (c) Finally, we could build an indexing data structure on top of the BBWT, compute the BBWT in-place or transform the BWT into the BBWT, and finally build the BBWT in linear time. Asides from that, we could find space-efficient factorization algorithms for the non-overlapping LZ77 factorization and the LZ78 substring compression problem. These algorithms work in near-linear time with space asymptotic to the input text length in bits.
|
Research Progress Status |
令和2年度が最終年度であるため、記入しない。
|
Strategy for Future Research Activity |
令和2年度が最終年度であるため、記入しない。
|