Indexing Massive Datasets with Algorithmic Engineered Compression Techniques on Modern Computer Architectures

研究課題

研究課題/領域番号	21K17701
研究種目	若手研究
配分区分	基金
審査区分	小区分60010:情報学基礎論関連
研究機関	山梨大学 (2023) 東京医科歯科大学 (2021-2022)
研究代表者	Koeppl Dominik 山梨大学, 大学院総合研究部, 特任准教授 (50897395)
研究期間 (年度)	2021-04-01 – 2025-03-31
研究課題ステータス	交付 (2023年度)
配分額 *注記	4,680千円 (直接経費: 3,600千円、間接経費: 1,080千円) 2023年度: 1,170千円 (直接経費: 900千円、間接経費: 270千円) 2022年度: 2,340千円 (直接経費: 1,800千円、間接経費: 540千円) 2021年度: 1,170千円 (直接経費: 900千円、間接経費: 270千円)
キーワード	compressed indexes / string subsequences / NP-hard problems / straight line programs / collage systems / block trees / parameterized BWT / pattern matching / data compression / matrix multiplication / matrix compression / subsequences / compact hashing / SIMD instructions / hybrid text indexes / compression techniques / indexing data structures / algorithm engineering / lossless compression / hybrid indexes
研究開始時の研究の概要	With the increasing generation of massive datasets, there is a rising need in managing and analyzing these datasets efficiently. Our idea to meet this need is to leverage compression techniques to not only compress data but also process it in such a way that specific queries can be executed in reasonable time. We aim for practical and time-efficient compressed data structures that bridge the gap between traditional indexing solutions and compression techniques by embracing modern computer architectures.
研究実績の概要	Following the research plan outlined for fiscal year 2023, our primary focus was on extending string regularities from substrings to subsequences, exploring NP-hard problems associated with strings, and refining compressed indexing data structures. In the first thematic area, for computing the longest Lyndon subsequence, we achieved space and time bounds superior to those presented at IWOCA in 2022. Furthermore, we demonstrated methodologies for computing the longest bordered and periodic subsequences. This involved using novel tools to compute the longest common subsequences between all prefixes and suffixes of a text, which facilitated the computation of longest bordered or periodic subsequences. Asides, for the longest bordered subsequences, we established a conditional lower bound aligning with our quadratic running time. Subsequently, we delved into studying common NP-hard problems with strings as inputs, leveraging answer set programming solvers. Additionally, we proved the NP-hardness of finding the smallest run-length compressed straight-line programs (RLSLPs) for unbounded alphabet sizes. We could adapt this proof to finding the smallest collage system. Additionally, we devised a MAX-SAT encoding for computing the smallest RLSLP. In the final thematic area, we made advancements in the construction, practically for block trees and theoretically for the parameterized Burrows-Wheeler transform. For the latter, we also demonstrated that this transform can be adapted for circular pattern matching by changing the encoding.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 We conducted the research for the fiscal year 2023 as planned, and could complete most of our planned research at the end of the grant lifespan in the fiscal year 2023.
今後の研究の推進方策	As the grant's term ended in fiscal year 2023, we are now in the process of preparing to apply for a new grant for fiscal year 2025, based on the fact that this research has unveiled new paths for further exploration within the realm of string regularities and compressed indexes, igniting our enthusiasm to pursue these paths in the forthcoming years. While our main attention has been set to text indexing data structures for classic pattern matching, the exploration of extended pattern matching queries remains largely undone. In response, we aim to expand upon several concepts discovered during our recent research, combining them with cutting-edge indexing techniques tailored for classic pattern matching. We anticipate that these innovative indexing methodologies will find practical applications in scenarios where conventional pattern matching proves too restrictive, necessitating more adaptable matching criteria.

報告書

(3件)

研究成果
(56件)

すべて 2024 2023 2022 2021 その他

すべて国際共同研究 (15件) 雑誌論文 (28件) (うち国際共著 28件、査読あり 28件、オープンアクセス 15件) 学会発表 (10件) (うち国際学会 1件) 備考 (3件)

[国際共同研究] MPI Saarbruecken/Karlsruhe institute of technology/University of Muenster(ドイツ)
- 関連する報告書
  2023 実施状況報告書
[国際共同研究] University of Helsinki(フィンランド)
- 関連する報告書
  2023 実施状況報告書
[国際共同研究] Nicolaus Copernicus University in Torun(ポーランド)
- 関連する報告書
  2023 実施状況報告書
[国際共同研究] Dalhousie University(カナダ)
- 関連する報告書
  2022 実施状況報告書
[国際共同研究] University of A Coruna(スペイン)
- 関連する報告書
  2022 実施状況報告書
[国際共同研究] University of Chile(チリ)
- 関連する報告書
  2022 実施状況報告書
[国際共同研究] Max Planck Institute for Informatics(ドイツ)
- 関連する報告書
  2022 実施状況報告書
[国際共同研究] University of Helsinki(フィンランド)
- 関連する報告書
  2022 実施状況報告書
[国際共同研究]
- 関連する報告書
  2022 実施状況報告書
[国際共同研究] Travis Gagie(カナダ)
- 関連する報告書
  2021 実施状況報告書
[国際共同研究] Nicola Prezza(イタリア)
- 関連する報告書
  2021 実施状況報告書
[国際共同研究] Gonzalo Navarro(チリ)
- 関連する報告書
  2021 実施状況報告書
[国際共同研究] Marcin Piatkowski(ポーランド)
- 関連する報告書
  2021 実施状況報告書
[国際共同研究] Robert W. Irving/Lorna Love(英国)
- 関連する報告書
  2021 実施状況報告書
[国際共同研究]
- 関連する報告書
  2021 実施状況報告書
[雑誌論文] Computing Longest Lyndon Subsequences and Longest Common Lyndon Subsequences2024
- 著者名/発表者名
  Hideo Bannai and Tomohiro I and Tomasz Kociumaka and Dominik Koeppl and Simon J. Puglisi
- 雑誌名
  
  Algorithmica
  
  巻: 86 号: 3 ページ: 735-756
- DOI
  10.1007/s00453-023-01125-z
- 関連する報告書
  2023 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Extending the Parameterized Burrows-Wheeler Transform2024
- 著者名/発表者名
  Eric M. Osterkamp and Dominik Koeppl
- 雑誌名
  
  Proceedings of DCC
  
  巻: - ページ: 143-152
- 関連する報告書
  2023 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] On the Hardness of Smallest RLSLPs and Collage Systems2024
- 著者名/発表者名
  Akiyoshi Kawamoto and Tomohiro I and Dominik Koeppl and Hideo Bannai
- 雑誌名
  
  Proceedings of DCC
  
  巻: - ページ: 243-252
- 関連する報告書
  2023 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Constructing and Indexing the Bijective and Extended Burrows-Wheeler Transform2024
- 著者名/発表者名
  Hideo Bannai and Juha Kaerkkaeinen and Dominik Koeppl and Marcin Piatkowski
- 雑誌名
  
  Inf. Comput.
  
  巻: 297 ページ: 1-30
- DOI
  10.1016/j.ic.2024.105153
- 関連する報告書
  2023 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Encoding Hard String Problems with Answer Set Programming2023
- 著者名/発表者名
  Dominik Koeppl
- 雑誌名
  
  Proceedings of CPM
  
  巻: 259
- 関連する報告書
  2023 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Longest bordered and periodic subsequences2023
- 著者名/発表者名
  Hideo Bannai and Tomohiro I and Dominik Koeppl
- 雑誌名
  
  Inf. Process. Lett.
  
  巻: 182 ページ: 1-6
- DOI
  10.1016/j.ipl.2023.106398
- 関連する報告書
  2023 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Faster Block Tree Construction2023
- 著者名/発表者名
  Dominik Koeppl and Florian Kurpicz and Daniel Meyer
- 雑誌名
  
  Proceedings of ESA
  
  巻: 274
- 関連する報告書
  2023 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Dynamic Skyline Computation with LSD Trees2023
- 著者名/発表者名
  Dominik Koeppl
- 雑誌名
  
  Analytics
  
  巻: 2 号: 1 ページ: 146-162
- DOI
  10.3390/analytics2010009
- 関連する報告書
  2022 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Space-efficient Huffman codes revisited2023
- 著者名/発表者名
  Szymon Grabowski and Dominik Koeppl
- 雑誌名
  
  Information Processing Letters
  
  巻: 179 ページ: 1-8
- DOI
  10.1016/j.ipl.2022.106274
- 関連する報告書
  2022 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Graph Compression for Adjacency-Matrix Multiplication2022
- 著者名/発表者名
  Alexandre P. Francisco and Travis Gagie and Dominik Koeppl and Susana Ladra and Gonzalo Navarro
- 雑誌名
  
  SN Computer Science
  
  巻: 3 号: 3 ページ: 1-8
- DOI
  10.1007/s42979-022-01084-2
- 関連する報告書
  2022 実施状況報告書 2021 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Computing Longest (Common) Lyndon Subsequences2022
- 著者名/発表者名
  Hideo Bannai, Tomohiro I, Tomasz Kociumaka, Dominik Koeppl, Simon J. Puglisi
- 雑誌名
  
  Proc. 33rd International Workshop on Combinatorial Algorithms (IWOCA) 2022
  
  巻: －ページ: 128-142
- DOI
  10.1007/978-3-031-06678-8_10
- ISBN
  9783031066771, 9783031066788
- 関連する報告書
  2022 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Space-Efficient B Trees via Load-Balancing2022
- 著者名/発表者名
  Tomohiro I, Dominik Koeppl
- 雑誌名
  
  Proc. 33rd International Workshop on Combinatorial Algorithms (IWOCA) 2022
  
  巻: －ページ: 327-340
- DOI
  10.1007/978-3-031-06678-8_24
- ISBN
  9783031066771, 9783031066788
- 関連する報告書
  2022 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Linking Off-Road Points to Routing Networks2022
- 著者名/発表者名
  Dominik Koeppl
- 雑誌名
  
  Algorithms
  
  巻: 15(5) 号: 5 ページ: 1-15
- DOI
  10.3390/a15050163
- 関連する報告書
  2022 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Fast and Simple Compact Hashing via Bucketing2022
- 著者名/発表者名
  Dominik Koeppl and Simon J. Puglisi and Rajeev Raman
- 雑誌名
  
  Algorithmica
  
  巻: 84 号: 9 ページ: 2735-2766
- DOI
  10.1007/s00453-022-00996-y
- 関連する報告書
  2022 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Computing the Parameterized Burrows-Wheeler Transform Online2022
- 著者名/発表者名
  Daiki Hashimoto and Diptarama Hendrian and Dominik Koeppl and Ryo Yoshinaka and Ayumi Shinohara
- 雑誌名
  
  Proceedings of SPIRE
  
  巻: 13617 ページ: 70-85
- DOI
  10.1007/978-3-031-20643-6_6
- ISBN
  9783031206429, 9783031206436
- 関連する報告書
  2022 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Accessing the Suffix Array via $\phi^-1$-Forest2022
- 著者名/発表者名
  Christina Boucher and Dominik Koeppl and Herman Perera and Massimiliano Rossi
- 雑誌名
  
  Proceedings of SPIRE
  
  巻: 13617 ページ: 86-98
- DOI
  10.1007/978-3-031-20643-6_7
- ISBN
  9783031206429, 9783031206436
- 関連する報告書
  2022 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Computing NP-hard Repetitiveness Measures via MAX-SAT2022
- 著者名/発表者名
  Hideo Bannai and Keisuke Goto and Masakazu Ishihata and Shunsuke Kanda and Dominik Koeppl and Takaaki Nishimoto
- 雑誌名
  
  Proceedings of ESA
  
  巻: 244
- 関連する報告書
  2022 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices2022
- 著者名/発表者名
  Paolo Ferragina and Giovanni Manzini and Travis Gagie and Dominik Koeppl and Gonzalo Navarro and Manuel Striani and Francesco Tosoni
- 雑誌名
  
  Proc. VLDB
  
  巻: 15 号: 10 ページ: 2175-2187
- DOI
  10.14778/3547305.3547321
- 関連する報告書
  2022 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Inferring Spatial Distance Rankings with Partial Knowledge on Routing Networks2022
- 著者名/発表者名
  Koeppl Dominik
- 雑誌名
  
  Information
  
  巻: 13 号: 4 ページ: 168-168
- DOI
  10.3390/info13040168
- 関連する報告書
  2021 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Computing Lexicographic Parsings2022
- 著者名/発表者名
  Koeppl Dominik
- 雑誌名
  
  Proc. DCC
  
  巻: 2022 ページ: 232-241
- DOI
  10.1109/dcc52660.2022.00031
- 関連する報告書
  2021 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] HOLZ: High-Order Entropy Encoding of {Lempel--Ziv} Factor Distances2022
- 著者名/発表者名
  Dominik Koeppl and Gonzalo Navarro and Nicola Prezza
- 雑誌名
  
  Proc. DCC
  
  巻: 2022 ページ: 83-92
- DOI
  10.1109/dcc52660.2022.00016
- 関連する報告書
  2021 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns2022
- 著者名/発表者名
  Jin Jie Deng and Wing-Kai Hon and Dominik Koeppl and Kunihiko Sadakane
- 雑誌名
  
  Proc. DCC
  
  巻: 83--92 ページ: 63-72
- DOI
  10.1109/dcc52660.2022.00014
- 関連する報告書
  2021 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] c-trie++: A dynamic trie tailored for fast prefix searches2021
- 著者名/発表者名
  Kazuya Tsuruta, Dominik Koeppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda
- 雑誌名
  
  Information and Computation
  
  巻: - ページ: 104794-104794
- DOI
  10.1016/j.ic.2021.104794
- 関連する報告書
  2022 実施状況報告書 2021 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Reversed Lempel-Ziv Factorization with Suffix Trees2021
- 著者名/発表者名
  Koeppl Dominik
- 雑誌名
  
  Algorithms
  
  巻: 14 号: 6 ページ: 161-161
- DOI
  10.3390/a14060161
- 関連する報告書
  2021 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] A Separation of $$\gamma $$ and b via Thue-Morse Words2021
- 著者名/発表者名
  Bannai Hideo、Funakoshi Mitsuru、I Tomohiro、Koeppl Dominik、Mieno Takuya、Nishimoto Takaaki
- 雑誌名
  
  Proceedings of the 28th International Symposium on String Processing and Information Retrieval (SPIRE 2021)
  
  巻: LNCS 12944 ページ: 167-178
- DOI
  10.1007/978-3-030-86692-1_14
- ISBN
  9783030866914, 9783030866921
- 関連する報告書
  2021 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Grammar Index by Induced Suffix Sorting2021
- 著者名/発表者名
  Tooru Akagi, Dominik Koeppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda
- 雑誌名
  
  Proceedings of 28th International Symposium on String Processing and Information Retrieval
  
  巻: 12944 ページ: 85-99
- DOI
  10.1007/978-3-030-86692-1_8
- ISBN
  9783030866914, 9783030866921
- 関連する報告書
  2021 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree2021
- 著者名/発表者名
  I Tomohiro、Irving Robert、Koeppl Dominik、Love Lorna
- 雑誌名
  
  Proc. SPIRE
  
  巻: 12944 ページ: 143-150
- DOI
  10.1007/978-3-030-86692-1_12
- ISBN
  9783030866914, 9783030866921
- 関連する報告書
  2021 実施状況報告書
- 査読あり / 国際共著
[雑誌論文] Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time2021
- 著者名/発表者名
  Bannai, Hideo and Kaerkkaeinen, Juha and Koeppl, Dominik and Piatkowski, Marcin
- 雑誌名
  
  32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)
  
  巻: 191
- 関連する報告書
  2021 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[学会発表] Answer Set Programming を用いた圧縮指標の計算2024
- 著者名/発表者名
  クップルドミニク and 番原睦則
- 学会等名
  Local Proceedings of the LA Symposium Winter 2023
- 関連する報告書
  2023 実施状況報告書
[学会発表] パラメタ化 Burrows-Wheeler 変換の拡張2023
- 著者名/発表者名
  Eric M. Osterkamp and Dominik Koeppl
- 学会等名
  Local Proceedings of コンピュテーション研究会
- 関連する報告書
  2023 実施状況報告書
[学会発表] lex-parse の圧縮感度2023
- 著者名/発表者名
  中島祐人 and クップルドミニク and 舩越満 and 稲永俊介
- 学会等名
  Local Proceedings of the 195th アルゴリズム研究会
- 関連する報告書
  2023 実施状況報告書
[学会発表] Encoding Hard String Problems with Answer Set Programming2023
- 著者名/発表者名
  Dominik Koeppl
- 学会等名
  Sequences in London
- 関連する報告書
  2023 実施状況報告書
- 国際学会
[学会発表] ZDDを用いた最小文字列アトラクタの列挙2023
- 著者名/発表者名
  藤岡祐太 and 斎藤寿樹 and クップルドミニク
- 学会等名
  日本オペレーションズ・リサーチ学会九州支部九州地区におけるOR若手研究交流会
- 関連する報告書
  2023 実施状況報告書
[学会発表] r インデックスにおける接尾辞配列を模倣するデータ構造2023
- 著者名/発表者名
  Christina Boucher and Dominik Koeppl and Herman Perera and Massimiliano Rossi
- 学会等名
  Local Proceedings of the LA Symposium Winter 2022
- 関連する報告書
  2022 実施状況報告書
[学会発表] アルファベット順による lex-parse サイズ比2023
- 著者名/発表者名
  中島祐人 and クップルドミニク and 舩越満 and 稲永俊介
- 学会等名
  Local Proceedings of the 191th アルゴリズム研究会
- 関連する報告書
  2022 実施状況報告書
[学会発表] 接尾辞木に基づくLZ77とLPF配列の変種の計算2022
- 著者名/発表者名
  クップルドミニク
- 学会等名
  Local Proceedings of コンピュテーション研究会
- 関連する報告書
  2022 実施状況報告書
[学会発表] Lempel-Ziv 項の距離を高次情報量で表現する符号2022
- 著者名/発表者名
  Dominik Koeppl and Gonzalo Navarro and Nicola Prezza
- 学会等名
  Local Proceedings of the 190th アルゴリズム研究会
- 関連する報告書
  2022 実施状況報告書
[学会発表] 省領域な lexicographic parse 構築アルゴリズム2022
- 著者名/発表者名
  Koeppl Dominik
- 学会等名
  COMP2021-28
- 関連する報告書
  2021 実施状況報告書
[備考] personal homepage
- URL
  https://dkppl.de/
- 関連する報告書
  2023 実施状況報告書
[備考] Personal Homepage
- URL
  https://dkppl.de/
- 関連する報告書
  2022 実施状況報告書
[備考] personal home page
- URL
  https://dkppl.de/
- 関連する報告書
  2021 実施状況報告書

Indexing Massive Datasets with Algorithmic Engineered Compression Techniques on Modern Computer Architectures

研究代表者

Koeppl Dominik 山梨大学, 大学院総合研究部, 特任准教授 (50897395)

4,680千円 (直接経費: 3,600千円、間接経費: 1,080千円)

現在までの達成度 (区分)

理由

報告書

研究成果

[国際共同研究] MPI Saarbruecken/Karlsruhe institute of technology/University of Muenster(ドイツ)

関連する報告書

[国際共同研究] University of Helsinki(フィンランド)

関連する報告書

[国際共同研究] Nicolaus Copernicus University in Torun(ポーランド)

関連する報告書

[国際共同研究] Dalhousie University(カナダ)

関連する報告書

[国際共同研究] University of A Coruna(スペイン)

関連する報告書

[国際共同研究] University of Chile(チリ)

関連する報告書

[国際共同研究] Max Planck Institute for Informatics(ドイツ)

関連する報告書

[国際共同研究] University of Helsinki(フィンランド)

関連する報告書

[国際共同研究]

関連する報告書

[国際共同研究] Travis Gagie(カナダ)

関連する報告書

[国際共同研究] Nicola Prezza(イタリア)

関連する報告書

[国際共同研究] Gonzalo Navarro(チリ)

関連する報告書

[国際共同研究] Marcin Piatkowski(ポーランド)

関連する報告書

[国際共同研究] Robert W. Irving/Lorna Love(英国)

関連する報告書

[国際共同研究]

関連する報告書

[雑誌論文] Computing Longest Lyndon Subsequences and Longest Common Lyndon Subsequences2024

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] Extending the Parameterized Burrows-Wheeler Transform2024

著者名/発表者名

雑誌名

関連する報告書

[雑誌論文] On the Hardness of Smallest RLSLPs and Collage Systems2024

著者名/発表者名

雑誌名

関連する報告書

[雑誌論文] Constructing and Indexing the Bijective and Extended Burrows-Wheeler Transform2024

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] Encoding Hard String Problems with Answer Set Programming2023

著者名/発表者名

雑誌名

関連する報告書

[雑誌論文] Longest bordered and periodic subsequences2023

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] Faster Block Tree Construction2023

著者名/発表者名

雑誌名

関連する報告書

[雑誌論文] Dynamic Skyline Computation with LSD Trees2023

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] Space-efficient Huffman codes revisited2023

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] Graph Compression for Adjacency-Matrix Multiplication2022