2022 Fiscal Year Annual Research Report
Resource-Constraint Privacy-Aware Data Structures Tackling Problems in Bioinformatics
Publicly Offered Research
Project Area | Creation and Organization of Innovative Algorithmic Foundations for Leading Social Innovations |
Project/Area Number |
21H05847
|
Research Institution | Tokyo Medical and Dental University |
Principal Investigator |
Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)
|
Project Period (FY) |
2021-09-10 – 2023-03-31
|
Keywords | data compression / genetic data indexes / resource constraints / text indexing / matching statistics / parameterized matching / suffix array access |
Outline of Annual Research Achievements |
For indexing biological data meaningful, we presented at SPIRE'22 two new approaches: The first is an augmentation of the r-index for improving the time for random accesses in the suffix array. This is usually done by a sequential application of the Phi-Array. This method has been experienced as slow in practice. We therefore could slightly improve the time by simulating the predecessor queries with a walk on a labelled graph, on which we can omit some of the predecessor queries. The second is for parameterized pattern matching, which is an extension of classic pattern matching. Here, we proposed the first efficient algorithm for computing the parameterized Burrows-Wheeler transform online. When it comes to computing matching statistics, we could practically improve the time for the computation with the r-index augmented with some helper data structures, in detail: a grammar with longest common extension (LCE) query support, and the thresholds array. While Bannai et al. [TCS'20] showed how to compute matching statistics with the r-index, we provided two successive improvements with a software called PHONI two years ago, and with a recent practical improvement by skipping some LCE queries by storing additional LCE values of the thresholds. We can justify this small space increase with a remarkable improvement in the query time since the LCE queries answered by the used grammar tend to be the bottleneck of the whole algorithm.
|
Research Progress Status |
令和4年度が最終年度であるため、記入しない。
|
Strategy for Future Research Activity |
令和4年度が最終年度であるため、記入しない。
|