研究課題/領域番号 |
21K17701
|
研究機関 | 東京医科歯科大学 |
研究代表者 |
Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)
|
研究期間 (年度) |
2021-04-01 – 2024-03-31
|
キーワード | compression techniques / indexing data structures / matrix compression / algorithm engineering |
研究実績の概要 |
This research lies in the intersection of compression techniques, indexing data structures, and algorithm engineering on modern computer architectures. Regarding compression, we proposed a space-efficient algorithm computing the reversed LZ-factorization in linear time. This algorithm can be modified to compute the longest previous non-overlapping reverse factor table. Next, we proposed a new representation of Lempel-Ziv 77 factors, which are usually represented by pairs of text offset and length. By exchanging the text offsets with the offsets within the list of co-lexicographically sorted prefixes read up to the starting position of the respective factor, we could empirically observe that these offsets tend to be smaller than the text offsets, improving the final compressed size when coding the pairs with a universal coder. Finally, we provided a linear-time construction algorithm of the bijective Burrows-Wheeler transform (BBWT), which can be used for data compression and for compressed text indexes. Speaking about compressed text indexes, we presented text indexes built on grammars based on suffix sorting, and showed that this grammar exhibits locality sensitive properties such that finding a pattern in the text can be done efficiently by constructing the same grammar on the pattern, and searching the non-terminals of the pattern in the grammar tree of the text. Finally, we departed from one-dimensional data, and proposed a vector-matrix multiplication on adjacency matrices compressed by extracting their bicliques space-efficiently.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
Grateful to my research collaborators, I could smoothly advance in our joint research plans as well as in my individual research. The research achievements resulted in several journal articles (MDPI Algorithms, MDPI Information, SN Computer Science, Information and Computation) and international conference papers (SPIRE'21, DCC'22, CPM'21). I am confident that these achievements will serve as a solid foundation for further progress in the new fiscal year.
|
今後の研究の推進方策 |
Although we have proposed algorithms constructing the BBWT [Koeppl et al., CPM'20][Bannai et al., CPM'21], and an index upon the BBWT [Bannai et al., CPM'19], we are unaware of the compression quality of the BBWT, which is expressed by the number of its character runs. Therefore, we want to study the relation between the character runs in the BBWT and the traditional BWT. A first step towards this direction would be the study of particular string families. Here, we want to study the special shape of the BWT when considering strings whose suffix arrays form arithmetic progressions. Speaking about compression, we want to study space-efficient ways in how to compute Huffman-based compression/decompression in constant time per character/codeword. Ideally, we want to find lower bounds on the space, and give solutions with space requirements close to this bound. Another compression technique useful for the matrix-vector multiplication could be grammar compression. Here, we want to study grammars that support such a multiplication time-efficiently. Additionally, we want to devise a pre-computation step for a given input matrix to improve its compressibility by leveraging the fact that we are allowed to shuffle columns and rows of the matrix. Finally, we want to extend our practical implementation of compact hash tables [Koeppl et al., SEA'20] with SIMD instructions to improve query times. SIMD instructions should help us to sustain practical performance if we partition a hash table into relatively large unsorted buckets, on which we perform linear search.
|
次年度使用額が生じた理由 |
The planned research with focus on modern computer hardware makes it necessary to invest in recent computer architectures featuring, among others, SIMD instruction sets like AVX-512 or graphic card computation. The research funding will also be used to conduct research stays on domestic and international level, as well as to participate at domestic workshops and international conferences.
|