2021 年度実績報告書

Resource-Constraint Privacy-Aware Data Structures Tackling Problems in Bioinformatics

公募研究

研究領域	社会変革の源泉となる革新的アルゴリズム基盤の創出と体系化
研究課題/領域番号	21H05847
研究機関	東京医科歯科大学
研究代表者	Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)
研究期間 (年度)	2021-09-10 – 2023-03-31
キーワード	factorization algorithms / LZ78 compression / lexicographic parse / sparse suffix sorting / grammar compression / compressed data / memory-efficiency / hashing
研究実績の概要	Striving for improvements in factorization algorithms and text indexing within resource-constraint environments, we gained more insights in both topics. For the first one (factorization algorithms), we practically improved the computation of the LZ78 parsing in low-memory by using algorithmically-engineered trie data structures. The main idea was to leverage compact hashing techniques. We also showed that we can improve the memory if we are allowed to output a variation of the factorization storing a compressed version of a hash table. We later also studied the computation of lexicographic parsings, which depend on the order of the suffixes in the text. There, we proposed a sparse Phi array that stores enough information to represent the whole suffix array. While restoring the suffix array from the sparse Phi array seems to be inefficient, the storage layout of this small data structure is enough to compute efficiently lexicographic parsings that use lexicographically-neighboring suffixes as references. For the second topic (working with sparse or compressed indexes), we reviewed the suffix binary search tree, a balanced search tree maintaining the order of designated suffixes, as a sparse indexing data structure capable for extracting the sparse suffix array and the sparse longest common prefix array. We also devised an indexing data structure built on top of a grammar to accelerate pattern matching by scanning for non-terminals covering several up to many terminal symbols instead of just single terminal symbols.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 We conducted the research for the fiscal year 2021 as planned, and could complete most of our planned research at the end of the grant lifespan in the fiscal year 2022.
今後の研究の推進方策	The research spawned several questions we want to investigate in the future: For the LZ78 trie computation, we showed how to also compute the LZW factorization, which is a practical variation of the LZ78 factorization. However, there is actually a family of LZ78-like factorizations, including LZD and LZMW, for which no such space-efficient algorithm yet exists. We ask to what extend we can generalize our techniques for computing other such kinds of factorizations. Regarding the proposed sparse Phi array representation, we have left its construction as an open problem. While a naive construction is straight-forward, a space-efficient construction seems to put a burden on the time. Advances in the r-index data structure have led to alternative representations of the Phi array, which seem to be good candidates for studying construction techniques. Finally, for the proposed index on grammar-compressed texts, we wonder whether we can attain a space/time trade-off by using grammars that improve locality-sensitivity by the expense of storing more information. Several other open problems related to the efficient construction of useful data structures such as the sparse Phi array pushed us to the proposition of an extension of this research project, which led to a new grant entitled "Constructing Compressed Indexes for Biological Sequences" with grant number JP23H04378.

研究成果

(19件)

すべて 2022 2021 その他

すべて国際共同研究 (5件) 雑誌論文 (10件) (うち国際共著 10件、査読あり 10件、オープンアクセス 4件) 学会発表 (3件) (うち国際学会 1件) 備考 (1件)

[国際共同研究] Nicolaus Copernicus University(ポーランド)
- 国名
  ポーランド
- 外国機関名
  Nicolaus Copernicus University
[国際共同研究] University of Glasgow/University of Leicester(英国)
- 国名
  英国
- 外国機関名
  University of Glasgow/University of Leicester
[国際共同研究] Millennium Institute/Tecnica Federico Santa Maria/University of Chile(チリ)
- 国名
  チリ
- 外国機関名
  Millennium Institute/Tecnica Federico Santa Maria/University of Chile
[国際共同研究] Baker Heart and Diabetes Institute(オーストラリア)
- 国名
  オーストラリア
- 外国機関名
  Baker Heart and Diabetes Institute
[国際共同研究] National Tsing Hua University(台湾)
- 国名
  その他の国・地域
- 外国機関名
  National Tsing Hua University
[雑誌論文] FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns2022
- 著者名/発表者名
  Jin Jie Deng and Wing-Kai Hon and Dominik Koeppl and Kunihiko Sadakane
- 雑誌名
  
  Proceedings of DCC
  
  巻: 26 ページ: 63-72
- DOI
  10.1109/DCC52660.2022.00014
- 査読あり / 国際共著
[雑誌論文] HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances2022
- 著者名/発表者名
  Dominik Koeppl and Gonzalo Navarro and Nicola Prezza
- 雑誌名
  
  Proceedings of DCC
  
  巻: 26 ページ: 83-92
- DOI
  10.1109/DCC52660.2022.00016
- 査読あり / 国際共著
[雑誌論文] Computing Lexicographic Parsings2022
- 著者名/発表者名
  Dominik Koeppl
- 雑誌名
  
  Proceedings of DCC
  
  巻: 26 ページ: 232-241
- DOI
  10.1109/DCC52660.2022.00031
- 査読あり / 国際共著
[雑誌論文] Inferring Spatial Distance Rankings with Partial Knowledge on Routing Networks2022
- 著者名/発表者名
  Dominik Koeppl
- 雑誌名
  
  Information
  
  巻: 13(4) ページ: 1-28
- DOI
  10.3390/info13040168
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Reversed Lempel-Ziv Factorization with Suffix Trees2021
- 著者名/発表者名
  Dominik Koeppl
- 雑誌名
  
  Algorithms
  
  巻: 14(6) ページ: 1-26
- DOI
  10.3390/a14060161
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time2021
- 著者名/発表者名
  Hideo Bannai and Juha Kaerkkaeinen and Dominik Koeppl and Marcin Piト?tkowski
- 雑誌名
  
  Proceedings of CPM
  
  巻: 191 ページ: 7:1-7:16
- DOI
  10.4230/LIPIcs.CPM.2021.7
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree2021
- 著者名/発表者名
  Tomohiro I and Dominik Koeppl and Robert Irving and Lorna Love
- 雑誌名
  
  Proceedings of SPIRE
  
  巻: 12944 ページ: 143-150
- DOI
  10.1007/978-3-030-86692-1_12
- 査読あり / 国際共著
[雑誌論文] Grammar Index by Induced Suffix Sorting2021
- 著者名/発表者名
  Tooru Akagi and Dominik Koeppl and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda
- 雑誌名
  
  Proceedings of SPIRE
  
  巻: 12944 ページ: 85-99
- DOI
  10.1007/978-3-030-86692-1_8
- 査読あり / 国際共著
[雑誌論文] A Separation of ホウ and b via Thue-Morse Words2021
- 著者名/発表者名
  Hideo Bannai and Mitsuru Funakoshi and Tomohiro I and Dominik Koeppl and Takuya Mieno and Takaaki Nishimoto
- 雑誌名
  
  Proceedings of SPIRE
  
  巻: 12944 ページ: 167-178
- DOI
  10.1007/978-3-030-86692-1_14
- 査読あり / 国際共著
[雑誌論文] Engineering Practical Lempel-Ziv Tries2021
- 著者名/発表者名
  Diego Arroyuelo and Rodrigo Cテ。novas and Johannes Fischer and Dominik Koeppl and Marvin Loebel and Gonzalo Navarro and Rajeev Raman
- 雑誌名
  
  ACM JEA
  
  巻: 26 ページ: 1.14:1-1.14:47
- DOI
  10.1145/3481638
- 査読あり / オープンアクセス / 国際共著
[学会発表] SATソルバを用いたNP困難な圧縮指標の高速計算2022
- 著者名/発表者名
  坂内英夫 and 後藤啓介 and 石畠正和 and 神田峻介 and クップルドミニク and 西本崇晃
- 学会等名
  人工知能学会研究会資料人工知能基本問題研究会
[学会発表] 省領域な lexicographic parse 構築アルゴリズム2021
- 著者名/発表者名
  クップルドミニク
- 学会等名
  Local Proceedings of コンピュテーション研究会
[学会発表] Computation of Variations of the LZ77 factorization and the LPF Array with Suffix Trees2021
- 著者名/発表者名
  Dominik Koeppl
- 学会等名
  WCTA
- 国際学会
[備考] Personal Homepage
- URL
  https://dkppl.de/

2021 年度 実績報告書

Resource-Constraint Privacy-Aware Data Structures Tackling Problems in Bioinformatics

研究代表者

Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)

現在までの達成度 (区分)

理由

研究成果

[国際共同研究] Nicolaus Copernicus University(ポーランド)

国名

外国機関名

[国際共同研究] University of Glasgow/University of Leicester(英国)

国名

外国機関名

[国際共同研究] Millennium Institute/Tecnica Federico Santa Maria/University of Chile(チリ)

国名

外国機関名

[国際共同研究] Baker Heart and Diabetes Institute(オーストラリア)

国名

外国機関名

[国際共同研究] National Tsing Hua University(台湾)

国名

外国機関名

[雑誌論文] FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns2022

著者名/発表者名

雑誌名

DOI

[雑誌論文] HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances2022

著者名/発表者名

雑誌名

DOI

[雑誌論文] Computing Lexicographic Parsings2022

著者名/発表者名

雑誌名

DOI

[雑誌論文] Inferring Spatial Distance Rankings with Partial Knowledge on Routing Networks2022

著者名/発表者名

雑誌名

DOI

[雑誌論文] Reversed Lempel-Ziv Factorization with Suffix Trees2021

著者名/発表者名

雑誌名

DOI

[雑誌論文] Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time2021

著者名/発表者名

雑誌名

DOI

[雑誌論文] Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree2021

著者名/発表者名

雑誌名

DOI

[雑誌論文] Grammar Index by Induced Suffix Sorting2021

著者名/発表者名

雑誌名

DOI

[雑誌論文] A Separation of ホウ and b via Thue-Morse Words2021

著者名/発表者名

雑誌名

DOI

[雑誌論文] Engineering Practical Lempel-Ziv Tries2021

著者名/発表者名

雑誌名

DOI

[学会発表] SATソルバを用いたNP困難な圧縮指標の高速計算2022

著者名/発表者名

学会等名

[学会発表] 省領域な lexicographic parse 構築アルゴリズム2021

著者名/発表者名

学会等名

[学会発表] Computation of Variations of the LZ77 factorization and the LPF Array with Suffix Trees2021

著者名/発表者名

学会等名

[備考] Personal Homepage

URL

2021 年度実績報告書