2019 Fiscal Year Annual Research Report
文字列圧縮と組合せ論による大規模データ管理・処理技法の開発
Project/Area Number |
18F18120
|
Research Institution | Kyushu University |
Principal Investigator |
稲永 俊介 九州大学, システム情報科学研究院, 准教授 (60448404)
|
Co-Investigator(Kenkyū-buntansha) |
KOEPPL DOMINIK 九州大学, システム情報科学研究院, 外国人特別研究員
|
Project Period (FY) |
2018-10-12 – 2021-03-31
|
Keywords | data structures / algorithms / lossless compression / text indexing |
Outline of Annual Research Achievements |
One of the major steps towards practically improved data structures was an in-depth analysis of hash tables. Here, we have worked with Shunsuke Kanda and Katsuya Tsuruta on different trie data structures employing hash tables in a clever way to speed up queries, or slim down their space usage. On a more general topic, I (Koeppl) could devise together with Rajeev Raman and Simon Puglisi two compact hash tables, which are optimized for fast construction while using less memory than any other known hash table. These hash tables help to improve associative containers in situations where insertion of big data is the most vital operation. The work with Shunsuke Kanda et al. has been sent to a journal, the work with Katsuya Tsuruta et al. got accepted at DCC'2020, and the work with Rajeev Raman and Simon Puglisi got accepted at SEA'2020.
I (Koeppl) set another research focus on the bijective Burrows-Wheeler transform (BBWT) [Gil and Scott, arXiv 2012]. Here, we devised a self-index on the BBWT, resulting into a conference paper at CPM'2019. Next, we found a connection between the BBWT and suffix sorting, resulting into a linear-time construction algorithm. We published this result on arXiv, and plan to submit the results combined with practical evaluations. To further understand the relation between the BBWT and BWT, together with researcher of Prof. Ayumi Shinohara's laboratory at Touhoku University, we studied conversions between these two transformations, and got the discoveries of this study accepted at CPM'2020.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
It is hard to judge whether the current status is delayed or in schedule. Most recent results have been accepted at conferences (twice in DCC 2020, once in CPM 2020, and once in SEA 2020), but there are not yet any proceedings available. I do not think that any of the journal articles I submitted with my colleagues during the JSPS program will get published before the scholarship ends, as the journal publication process in theoretical computer science, especially in renominated journals like Algorithmica or TCS, takes unfortunately very long time. The current results also spark new research questions, which I probably cannot completely answer during the JSPS program. Overall, I am satisfied with the current research status, and I am confident that the achievements during the two years program will be considered as worthwhile.
|
Strategy for Future Research Activity |
For the following period of six months, I have two projects in mind. The first is to analyze different tools to speed up and slim down the Lempel-Ziv 78 factorization for which we have elaborated the main tools such as a compact hash table (i.e., the SEA'2020 publication). The plan is to elaborate an exhaustive study submit-able to a journal. The second is to find new possibilities in indexing integer and real matrices within compressed space. The aim is to augment the computed grammar with an indexing data structure for accelerating common matrix operations such as multiplication. There are currently no sophisticated approaches in how to exploit two-dimensional data by means of a grammar sufficiently. The first objective would be to propose an approach that exploits the shape of the two-dimensional data in such a way that the grammar is much smaller than a string grammar built on the serialization of a matrix. The second objective would be to propose an indexing data structure for common matrix operations that needs less space than the plain matrix while performing an operation faster. Another line of research in this topic is to study ways of computing already proposed grammars in less time, ideally in optimal time in the word-packing model.
|
-
-
-
-
-
[Journal Article] Indexing the Bijective BWT2019
Author(s)
Hideo Bannai, Juha Karkkainen, Dominik Koeppl, Marcin Piatkowski
-
Journal Title
Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching - CPM 2019
Volume: 128 in LIPIcs series
Pages: 17:1-17:14
DOI
Peer Reviewed / Open Access / Int'l Joint Research
-
-
-
-
-
-
-
-
-
-