Resource-Constraint Privacy-Aware Data Structures Tackling Problems in Bioinformatics
Publicly Offered Research
Project Area | Creation and Organization of Innovative Algorithmic Foundations for Leading Social Innovations |
Project/Area Number |
21H05847
|
Research Category |
Grant-in-Aid for Transformative Research Areas (A)
|
Allocation Type | Single-year Grants |
Review Section |
Transformative Research Areas, Section (IV)
|
Research Institution | Tokyo Medical and Dental University |
Principal Investigator |
Koeppl Dominik 東京医科歯科大学, M&Dデータ科学センター, 助教 (50897395)
|
Project Period (FY) |
2021-09-10 – 2023-03-31
|
Project Status |
Completed (Fiscal Year 2022)
|
Budget Amount *help |
¥5,200,000 (Direct Cost: ¥4,000,000、Indirect Cost: ¥1,200,000)
Fiscal Year 2022: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Fiscal Year 2021: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
|
Keywords | data compression / genetic data indexes / resource constraints / text indexing / matching statistics / parameterized matching / suffix array access / privacy-aware computing / factorization algorithms / LZ78 compression / lexicographic parse / sparse suffix sorting / grammar compression / compressed data / memory-efficiency / hashing / biological data indexing / space-efficiency / privacy-aware / lossless compression / compressed indexing |
Outline of Research at the Start |
Recent advances in technology has made it possible to collect vast amounts of biological data valuable for studying genetic diseases and devising individually targeted therapies. Unfortunately, while the collection of such data has gathered high momentum, we are unaware of solutions that can cope with the collected data efficiently while supporting biologically important queries under the restriction that privacy is respected. Such a solution can make it possible to discover insights into diseases and side effects of medical treatments caused by genetic variations.
|
Outline of Annual Research Achievements |
For indexing biological data meaningful, we presented at SPIRE'22 two new approaches: The first is an augmentation of the r-index for improving the time for random accesses in the suffix array. This is usually done by a sequential application of the Phi-Array. This method has been experienced as slow in practice. We therefore could slightly improve the time by simulating the predecessor queries with a walk on a labelled graph, on which we can omit some of the predecessor queries. The second is for parameterized pattern matching, which is an extension of classic pattern matching. Here, we proposed the first efficient algorithm for computing the parameterized Burrows-Wheeler transform online. When it comes to computing matching statistics, we could practically improve the time for the computation with the r-index augmented with some helper data structures, in detail: a grammar with longest common extension (LCE) query support, and the thresholds array. While Bannai et al. [TCS'20] showed how to compute matching statistics with the r-index, we provided two successive improvements with a software called PHONI two years ago, and with a recent practical improvement by skipping some LCE queries by storing additional LCE values of the thresholds. We can justify this small space increase with a remarkable improvement in the query time since the LCE queries answered by the used grammar tend to be the bottleneck of the whole algorithm.
|
Research Progress Status |
令和4年度が最終年度であるため、記入しない。
|
Strategy for Future Research Activity |
令和4年度が最終年度であるため、記入しない。
|
Report
(2 results)
Research Products
(39 results)
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] Computing Longest (Common) Lyndon Subsequences2022
Author(s)
Hideo Bannai, Tomohiro I, Tomasz Kociumaka, Dominik Koeppl, Simon J. Puglisi
-
Journal Title
Proc. 33rd International Workshop on Combinatorial Algorithms (IWOCA) 2022
Volume: -
Pages: 128-142
DOI
ISBN
9783031066771, 9783031066788
Related Report
Peer Reviewed / Int'l Joint Research
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] Grammar Index by Induced Suffix Sorting2021
Author(s)
Tooru Akagi, Dominik Koeppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda
-
Journal Title
Proceedings of 28th International Symposium on String Processing and Information Retrieval
Volume: 12944
Pages: 85-99
DOI
ISBN
9783030866914, 9783030866921
Related Report
Peer Reviewed / Int'l Joint Research
-
[Journal Article] A Separation of $$\gamma $$ and b via Thue-Morse Words2021
Author(s)
Bannai Hideo、Funakoshi Mitsuru、I Tomohiro、Koeppl Dominik、Mieno Takuya、Nishimoto Takaaki
-
Journal Title
Proceedings of the 28th International Symposium on String Processing and Information Retrieval (SPIRE 2021)
Volume: LNCS 12944
Pages: 167-178
DOI
ISBN
9783030866914, 9783030866921
Related Report
Peer Reviewed / Int'l Joint Research
-
-
-
-
-
-
-
-
-
-