配分額 *注記 |
5,200千円 (直接経費: 4,000千円、間接経費: 1,200千円)
2022年度: 2,600千円 (直接経費: 2,000千円、間接経費: 600千円)
2021年度: 2,600千円 (直接経費: 2,000千円、間接経費: 600千円)
|
研究開始時の研究の概要 |
Recent advances in technology has made it possible to collect vast amounts of biological data valuable for studying genetic diseases and devising individually targeted therapies. Unfortunately, while the collection of such data has gathered high momentum, we are unaware of solutions that can cope with the collected data efficiently while supporting biologically important queries under the restriction that privacy is respected. Such a solution can make it possible to discover insights into diseases and side effects of medical treatments caused by genetic variations.
|
研究実績の概要 |
For indexing biological data meaningful, we presented at SPIRE'22 two new approaches: The first is an augmentation of the r-index for improving the time for random accesses in the suffix array. This is usually done by a sequential application of the Phi-Array. This method has been experienced as slow in practice. We therefore could slightly improve the time by simulating the predecessor queries with a walk on a labelled graph, on which we can omit some of the predecessor queries. The second is for parameterized pattern matching, which is an extension of classic pattern matching. Here, we proposed the first efficient algorithm for computing the parameterized Burrows-Wheeler transform online. When it comes to computing matching statistics, we could practically improve the time for the computation with the r-index augmented with some helper data structures, in detail: a grammar with longest common extension (LCE) query support, and the thresholds array. While Bannai et al. [TCS'20] showed how to compute matching statistics with the r-index, we provided two successive improvements with a software called PHONI two years ago, and with a recent practical improvement by skipping some LCE queries by storing additional LCE values of the thresholds. We can justify this small space increase with a remarkable improvement in the query time since the LCE queries answered by the used grammar tend to be the bottleneck of the whole algorithm.
|