2021 Fiscal Year Annual Research Report

Resource-Constraint Privacy-Aware Data Structures Tackling Problems in Bioinformatics

Publicly Offered Research

Project Area	Creation and Organization of Innovative Algorithmic Foundations for Leading Social Innovations
Project/Area Number	21H05847
Research Institution	Tokyo Medical and Dental University
Principal Investigator	Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)
Project Period (FY)	2021-09-10 – 2023-03-31
Keywords	factorization algorithms / LZ78 compression / lexicographic parse / sparse suffix sorting / grammar compression / compressed data / memory-efficiency / hashing
Outline of Annual Research Achievements	Striving for improvements in factorization algorithms and text indexing within resource-constraint environments, we gained more insights in both topics. For the first one (factorization algorithms), we practically improved the computation of the LZ78 parsing in low-memory by using algorithmically-engineered trie data structures. The main idea was to leverage compact hashing techniques. We also showed that we can improve the memory if we are allowed to output a variation of the factorization storing a compressed version of a hash table. We later also studied the computation of lexicographic parsings, which depend on the order of the suffixes in the text. There, we proposed a sparse Phi array that stores enough information to represent the whole suffix array. While restoring the suffix array from the sparse Phi array seems to be inefficient, the storage layout of this small data structure is enough to compute efficiently lexicographic parsings that use lexicographically-neighboring suffixes as references. For the second topic (working with sparse or compressed indexes), we reviewed the suffix binary search tree, a balanced search tree maintaining the order of designated suffixes, as a sparse indexing data structure capable for extracting the sparse suffix array and the sparse longest common prefix array. We also devised an indexing data structure built on top of a grammar to accelerate pattern matching by scanning for non-terminals covering several up to many terminal symbols instead of just single terminal symbols.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason We conducted the research for the fiscal year 2021 as planned, and could complete most of our planned research at the end of the grant lifespan in the fiscal year 2022.
Strategy for Future Research Activity	The research spawned several questions we want to investigate in the future: For the LZ78 trie computation, we showed how to also compute the LZW factorization, which is a practical variation of the LZ78 factorization. However, there is actually a family of LZ78-like factorizations, including LZD and LZMW, for which no such space-efficient algorithm yet exists. We ask to what extend we can generalize our techniques for computing other such kinds of factorizations. Regarding the proposed sparse Phi array representation, we have left its construction as an open problem. While a naive construction is straight-forward, a space-efficient construction seems to put a burden on the time. Advances in the r-index data structure have led to alternative representations of the Phi array, which seem to be good candidates for studying construction techniques. Finally, for the proposed index on grammar-compressed texts, we wonder whether we can attain a space/time trade-off by using grammars that improve locality-sensitivity by the expense of storing more information. Several other open problems related to the efficient construction of useful data structures such as the sparse Phi array pushed us to the proposition of an extension of this research project, which led to a new grant entitled "Constructing Compressed Indexes for Biological Sequences" with grant number JP23H04378.

Research Products
(19 results)

All 2022 2021 Other

All Int'l Joint Research (5 results) Journal Article (10 results) (of which Int'l Joint Research: 10 results, Peer Reviewed: 10 results, Open Access: 4 results) Presentation (3 results) (of which Int'l Joint Research: 1 results) Remarks (1 results)

[Int'l Joint Research] Nicolaus Copernicus University(ポーランド)
- Country Name
  POLAND
- Counterpart Institution
  Nicolaus Copernicus University
[Int'l Joint Research] University of Glasgow/University of Leicester(英国)
- Country Name
  UNITED KINGDOM
- Counterpart Institution
  University of Glasgow/University of Leicester
[Int'l Joint Research] Millennium Institute/Tecnica Federico Santa Maria/University of Chile(チリ)
- Country Name
  CHILE
- Counterpart Institution
  Millennium Institute/Tecnica Federico Santa Maria/University of Chile
[Int'l Joint Research] Baker Heart and Diabetes Institute(オーストラリア)
- Country Name
  AUSTRALIA
- Counterpart Institution
  Baker Heart and Diabetes Institute
[Int'l Joint Research] National Tsing Hua University(台湾)
- Country Name
  その他の国・地域
- Counterpart Institution
  National Tsing Hua University
[Journal Article] FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns2022
- Author(s)
  Jin Jie Deng and Wing-Kai Hon and Dominik Koeppl and Kunihiko Sadakane
- Journal Title
  
  Proceedings of DCC
  
  Volume: 26 Pages: 63-72
- DOI
  10.1109/DCC52660.2022.00014
- Peer Reviewed / Int'l Joint Research
[Journal Article] HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances2022
- Author(s)
  Dominik Koeppl and Gonzalo Navarro and Nicola Prezza
- Journal Title
  
  Proceedings of DCC
  
  Volume: 26 Pages: 83-92
- DOI
  10.1109/DCC52660.2022.00016
- Peer Reviewed / Int'l Joint Research
[Journal Article] Computing Lexicographic Parsings2022
- Author(s)
  Dominik Koeppl
- Journal Title
  
  Proceedings of DCC
  
  Volume: 26 Pages: 232-241
- DOI
  10.1109/DCC52660.2022.00031
- Peer Reviewed / Int'l Joint Research
[Journal Article] Inferring Spatial Distance Rankings with Partial Knowledge on Routing Networks2022
- Author(s)
  Dominik Koeppl
- Journal Title
  
  Information
  
  Volume: 13(4) Pages: 1-28
- DOI
  10.3390/info13040168
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Reversed Lempel-Ziv Factorization with Suffix Trees2021
- Author(s)
  Dominik Koeppl
- Journal Title
  
  Algorithms
  
  Volume: 14(6) Pages: 1-26
- DOI
  10.3390/a14060161
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time2021
- Author(s)
  Hideo Bannai and Juha Kaerkkaeinen and Dominik Koeppl and Marcin Piト?tkowski
- Journal Title
  
  Proceedings of CPM
  
  Volume: 191 Pages: 7:1-7:16
- DOI
  10.4230/LIPIcs.CPM.2021.7
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree2021
- Author(s)
  Tomohiro I and Dominik Koeppl and Robert Irving and Lorna Love
- Journal Title
  
  Proceedings of SPIRE
  
  Volume: 12944 Pages: 143-150
- DOI
  10.1007/978-3-030-86692-1_12
- Peer Reviewed / Int'l Joint Research
[Journal Article] Grammar Index by Induced Suffix Sorting2021
- Author(s)
  Tooru Akagi and Dominik Koeppl and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda
- Journal Title
  
  Proceedings of SPIRE
  
  Volume: 12944 Pages: 85-99
- DOI
  10.1007/978-3-030-86692-1_8
- Peer Reviewed / Int'l Joint Research
[Journal Article] A Separation of ホウ and b via Thue-Morse Words2021
- Author(s)
  Hideo Bannai and Mitsuru Funakoshi and Tomohiro I and Dominik Koeppl and Takuya Mieno and Takaaki Nishimoto
- Journal Title
  
  Proceedings of SPIRE
  
  Volume: 12944 Pages: 167-178
- DOI
  10.1007/978-3-030-86692-1_14
- Peer Reviewed / Int'l Joint Research
[Journal Article] Engineering Practical Lempel-Ziv Tries2021
- Author(s)
  Diego Arroyuelo and Rodrigo Cテ。novas and Johannes Fischer and Dominik Koeppl and Marvin Loebel and Gonzalo Navarro and Rajeev Raman
- Journal Title
  
  ACM JEA
  
  Volume: 26 Pages: 1.14:1-1.14:47
- DOI
  10.1145/3481638
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] SATソルバを用いたNP困難な圧縮指標の高速計算2022
- Author(s)
  坂内英夫 and 後藤啓介 and 石畠正和 and 神田峻介 and クップルドミニク and 西本崇晃
- Organizer
  人工知能学会研究会資料人工知能基本問題研究会
[Presentation] 省領域な lexicographic parse 構築アルゴリズム2021
- Author(s)
  クップルドミニク
- Organizer
  Local Proceedings of コンピュテーション研究会
[Presentation] Computation of Variations of the LZ77 factorization and the LPF Array with Suffix Trees2021
- Author(s)
  Dominik Koeppl
- Organizer
  WCTA
- Int'l Joint Research
[Remarks] Personal Homepage
- URL
  https://dkppl.de/

2021 Fiscal Year Annual Research Report

Resource-Constraint Privacy-Aware Data Structures Tackling Problems in Bioinformatics

Principal Investigator

Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] Nicolaus Copernicus University(ポーランド)

Country Name

Counterpart Institution

[Int'l Joint Research] University of Glasgow/University of Leicester(英国)

Country Name

Counterpart Institution

[Int'l Joint Research] Millennium Institute/Tecnica Federico Santa Maria/University of Chile(チリ)

Country Name

Counterpart Institution

[Int'l Joint Research] Baker Heart and Diabetes Institute(オーストラリア)

Country Name

Counterpart Institution

[Int'l Joint Research] National Tsing Hua University(台湾)

Country Name

Counterpart Institution

[Journal Article] FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns2022

Author(s)

Journal Title

DOI

[Journal Article] HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances2022

Author(s)

Journal Title

DOI

[Journal Article] Computing Lexicographic Parsings2022

Author(s)

Journal Title

DOI

[Journal Article] Inferring Spatial Distance Rankings with Partial Knowledge on Routing Networks2022

Author(s)

Journal Title

DOI

[Journal Article] Reversed Lempel-Ziv Factorization with Suffix Trees2021

Author(s)

Journal Title

DOI

[Journal Article] Constructing the Bijective and the Extended Burrows-Wheeler Transform in Linear Time2021

Author(s)

Journal Title

DOI

[Journal Article] Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree2021

Author(s)

Journal Title

DOI

[Journal Article] Grammar Index by Induced Suffix Sorting2021

Author(s)

Journal Title

DOI

[Journal Article] A Separation of ホウ and b via Thue-Morse Words2021

Author(s)

Journal Title

DOI

[Journal Article] Engineering Practical Lempel-Ziv Tries2021

Author(s)

Journal Title

DOI

[Presentation] SATソルバを用いたNP困難な圧縮指標の高速計算2022

Author(s)

Organizer

[Presentation] 省領域な lexicographic parse 構築アルゴリズム2021

Author(s)

Organizer

[Presentation] Computation of Variations of the LZ77 factorization and the LPF Array with Suffix Trees2021

Author(s)

Organizer

[Remarks] Personal Homepage

URL