2019 年度実績報告書

文字列圧縮と組合せ論による大規模データ管理・処理技法の開発

研究課題

研究課題/領域番号	18F18120
研究機関	九州大学
研究代表者	稲永俊介九州大学, システム情報科学研究院, 准教授 (60448404)
研究分担者	KOEPPL DOMINIK 九州大学, システム情報科学研究院, 外国人特別研究員
研究期間 (年度)	2018-10-12 – 2021-03-31
キーワード	data structures / algorithms / lossless compression / text indexing
研究実績の概要	One of the major steps towards practically improved data structures was an in-depth analysis of hash tables. Here, we have worked with Shunsuke Kanda and Katsuya Tsuruta on different trie data structures employing hash tables in a clever way to speed up queries, or slim down their space usage. On a more general topic, I (Koeppl) could devise together with Rajeev Raman and Simon Puglisi two compact hash tables, which are optimized for fast construction while using less memory than any other known hash table. These hash tables help to improve associative containers in situations where insertion of big data is the most vital operation. The work with Shunsuke Kanda et al. has been sent to a journal, the work with Katsuya Tsuruta et al. got accepted at DCC'2020, and the work with Rajeev Raman and Simon Puglisi got accepted at SEA'2020. I (Koeppl) set another research focus on the bijective Burrows-Wheeler transform (BBWT) [Gil and Scott, arXiv 2012]. Here, we devised a self-index on the BBWT, resulting into a conference paper at CPM'2019. Next, we found a connection between the BBWT and suffix sorting, resulting into a linear-time construction algorithm. We published this result on arXiv, and plan to submit the results combined with practical evaluations. To further understand the relation between the BBWT and BWT, together with researcher of Prof. Ayumi Shinohara's laboratory at Touhoku University, we studied conversions between these two transformations, and got the discoveries of this study accepted at CPM'2020.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 It is hard to judge whether the current status is delayed or in schedule. Most recent results have been accepted at conferences (twice in DCC 2020, once in CPM 2020, and once in SEA 2020), but there are not yet any proceedings available. I do not think that any of the journal articles I submitted with my colleagues during the JSPS program will get published before the scholarship ends, as the journal publication process in theoretical computer science, especially in renominated journals like Algorithmica or TCS, takes unfortunately very long time. The current results also spark new research questions, which I probably cannot completely answer during the JSPS program. Overall, I am satisfied with the current research status, and I am confident that the achievements during the two years program will be considered as worthwhile.
今後の研究の推進方策	For the following period of six months, I have two projects in mind. The first is to analyze different tools to speed up and slim down the Lempel-Ziv 78 factorization for which we have elaborated the main tools such as a compact hash table (i.e., the SEA'2020 publication). The plan is to elaborate an exhaustive study submit-able to a journal. The second is to find new possibilities in indexing integer and real matrices within compressed space. The aim is to augment the computed grammar with an indexing data structure for accelerating common matrix operations such as multiplication. There are currently no sophisticated approaches in how to exploit two-dimensional data by means of a grammar sufficiently. The first objective would be to propose an approach that exploits the shape of the two-dimensional data in such a way that the grammar is much smaller than a string grammar built on the serialization of a matrix. The second objective would be to propose an indexing data structure for common matrix operations that needs less space than the plain matrix while performing an operation faster. Another line of research in this topic is to study ways of computing already proposed grammars in less time, ideally in optimal time in the word-packing model.

研究成果
(15件)

すべて 2020 2019 その他

すべて国際共同研究 (4件) 雑誌論文 (3件) (うち国際共著 2件、査読あり 3件、オープンアクセス 2件) 学会発表 (7件) (うち国際学会 3件、招待講演 1件) 備考 (1件)

[国際共同研究] TU Dortmund/Goethe University Frankfurt(ドイツ)
- 国名
  ドイツ
- 外国機関名
  TU Dortmund/Goethe University Frankfurt
[国際共同研究] Helsinki University(フィンランド)
- 国名
  フィンランド
- 外国機関名
  Helsinki University
[国際共同研究] Nicolaus Copernicus University(ポーランド)
- 国名
  ポーランド
- 外国機関名
  Nicolaus Copernicus University
[国際共同研究] University of Leicester(英国)
- 国名
  英国
- 外国機関名
  University of Leicester
[雑誌論文] Indexing the Bijective BWT2019
- 著者名/発表者名
  Hideo Bannai, Juha Karkkainen, Dominik Koeppl, Marcin Piatkowski
- 雑誌名
  
  Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching - CPM 2019
  
  巻: 128 in LIPIcs series ページ: 17:1-17:14
- DOI
  https://doi.org/10.4230/LIPIcs.CPM.2019.17
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Bidirectional Text Compression in External Memory2019
- 著者名/発表者名
  Patrick Dinklage, Jonas Ellert, Johannes Fischer, Dominik Koeppl, Manuel Penschuck
- 雑誌名
  
  Proceedings of the 27th Annual European Symposium on Algorithms - ESA 2019
  
  巻: 144 in LIPIcs series ページ: 41:1-41:16
- DOI
  https://doi.org/10.4230/LIPIcs.ESA.2019.41
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Compact Data Structures for Shortest Unique Substring Queries2019
- 著者名/発表者名
  Takuya Mieno, Dominik Koeppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda
- 雑誌名
  
  Proceedings of the 26th International Symposium on String Processing and Information Retrieval - SPIRE 2019
  
  巻: 11811 in LNCS ページ: 107-123
- DOI
  https://doi.org/10.1007/978-3-030-32686-9_8
- 査読あり
[学会発表] Constructing the Bijective BWT2020
- 著者名/発表者名
  Dominik Koeppl
- 学会等名
  The 28th London Stringology Days & London Algorithmic Workshop - LAWS&LSD 2020
- 国際学会
[学会発表] In-Place Bijective Burrows Wheeler Transformations2020
- 著者名/発表者名
  Dominik Koeppl, Daiki Hashimoto, Diptarama Hendrian and Ayumi Shinohara
- 学会等名
  Data Structures in Bioinformatics workshop - DSB2020
- 国際学会
[学会発表] Constructing the Bijective BWT2019
- 著者名/発表者名
  Hideo Bannai, Juha Karkkainen, Dominik Koeppl, Marcin Piatkowski
- 学会等名
  175th アルゴリズム研究会, 2019
[学会発表] Dominik Koeppl, 井智弘, 古谷勇, 高畠嘉将, 酒井健輔, 後藤啓介2019
- 著者名/発表者名
  Re-Pair In-Place
- 学会等名
  LA Symposium Summer 2019
[学会発表] Separate Chaining Meets Compact Hashing2019
- 著者名/発表者名
  Dominik Koeppl
- 学会等名
  173th アルゴリズム研究会
[学会発表] Dynamic Trie Tailored for Fast Prefix Searches2019
- 著者名/発表者名
  鶴田和弥, Dominik Koeppl, 神田峻介, 中島祐人, 稲永俊介, 坂内英夫, 竹田正幸
- 学会等名
  LA Symposium Summer 2019
[学会発表] Searching Patterns in the Bijective BWT2019
- 著者名/発表者名
  Dominik Koeppl
- 学会等名
  Dagstuhl Seminar 19241 "25 Years of the Burrows-Wheeler Transform
- 国際学会 / 招待講演
[備考] Homepage of Dominik Koeppl
- URL
  https://dkppl.de/

2019 年度 実績報告書

文字列圧縮と組合せ論による大規模データ管理・処理技法の開発

研究代表者

稲永 俊介 九州大学, システム情報科学研究院, 准教授 (60448404)

現在までの達成度 (区分)

理由

研究成果

[国際共同研究] TU Dortmund/Goethe University Frankfurt(ドイツ)

国名

外国機関名

[国際共同研究] Helsinki University(フィンランド)

国名

外国機関名

[国際共同研究] Nicolaus Copernicus University(ポーランド)

国名

外国機関名

[国際共同研究] University of Leicester(英国)

国名

外国機関名

[雑誌論文] Indexing the Bijective BWT2019

著者名/発表者名

雑誌名

DOI

[雑誌論文] Bidirectional Text Compression in External Memory2019

著者名/発表者名

雑誌名

DOI

[雑誌論文] Compact Data Structures for Shortest Unique Substring Queries2019

著者名/発表者名

雑誌名

DOI

[学会発表] Constructing the Bijective BWT2020

著者名/発表者名

学会等名

[学会発表] In-Place Bijective Burrows Wheeler Transformations2020

著者名/発表者名

学会等名

[学会発表] Constructing the Bijective BWT2019

著者名/発表者名

学会等名

[学会発表] Dominik Koeppl, 井 智弘, 古谷 勇, 高畠 嘉将, 酒井 健輔, 後藤 啓介2019

著者名/発表者名

学会等名

[学会発表] Separate Chaining Meets Compact Hashing2019

著者名/発表者名

学会等名

[学会発表] Dynamic Trie Tailored for Fast Prefix Searches2019

著者名/発表者名

学会等名

[学会発表] Searching Patterns in the Bijective BWT2019

著者名/発表者名

学会等名

[備考] Homepage of Dominik Koeppl

URL

2019 年度実績報告書

稲永俊介九州大学, システム情報科学研究院, 准教授 (60448404)

[学会発表] Dominik Koeppl, 井智弘, 古谷勇, 高畠嘉将, 酒井健輔, 後藤啓介2019