研究課題/領域番号 |
21K17701
|
研究種目 |
若手研究
|
配分区分 | 基金 |
審査区分 |
小区分60010:情報学基礎論関連
|
研究機関 | 山梨大学 (2023) 東京医科歯科大学 (2021-2022) |
研究代表者 |
Koeppl Dominik 山梨大学, 大学院総合研究部, 特任准教授 (50897395)
|
研究期間 (年度) |
2021-04-01 – 2025-03-31
|
研究課題ステータス |
交付 (2023年度)
|
配分額 *注記 |
4,680千円 (直接経費: 3,600千円、間接経費: 1,080千円)
2023年度: 1,170千円 (直接経費: 900千円、間接経費: 270千円)
2022年度: 2,340千円 (直接経費: 1,800千円、間接経費: 540千円)
2021年度: 1,170千円 (直接経費: 900千円、間接経費: 270千円)
|
キーワード | compressed indexes / string subsequences / NP-hard problems / straight line programs / collage systems / block trees / parameterized BWT / pattern matching / data compression / matrix multiplication / matrix compression / subsequences / compact hashing / SIMD instructions / hybrid text indexes / compression techniques / indexing data structures / algorithm engineering / lossless compression / hybrid indexes |
研究開始時の研究の概要 |
With the increasing generation of massive datasets, there is a rising need in managing and analyzing these datasets efficiently. Our idea to meet this need is to leverage compression techniques to not only compress data but also process it in such a way that specific queries can be executed in reasonable time. We aim for practical and time-efficient compressed data structures that bridge the gap between traditional indexing solutions and compression techniques by embracing modern computer architectures.
|
研究実績の概要 |
Following the research plan outlined for fiscal year 2023, our primary focus was on extending string regularities from substrings to subsequences, exploring NP-hard problems associated with strings, and refining compressed indexing data structures. In the first thematic area, for computing the longest Lyndon subsequence, we achieved space and time bounds superior to those presented at IWOCA in 2022. Furthermore, we demonstrated methodologies for computing the longest bordered and periodic subsequences. This involved using novel tools to compute the longest common subsequences between all prefixes and suffixes of a text, which facilitated the computation of longest bordered or periodic subsequences. Asides, for the longest bordered subsequences, we established a conditional lower bound aligning with our quadratic running time. Subsequently, we delved into studying common NP-hard problems with strings as inputs, leveraging answer set programming solvers. Additionally, we proved the NP-hardness of finding the smallest run-length compressed straight-line programs (RLSLPs) for unbounded alphabet sizes. We could adapt this proof to finding the smallest collage system. Additionally, we devised a MAX-SAT encoding for computing the smallest RLSLP. In the final thematic area, we made advancements in the construction, practically for block trees and theoretically for the parameterized Burrows-Wheeler transform. For the latter, we also demonstrated that this transform can be adapted for circular pattern matching by changing the encoding.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
We conducted the research for the fiscal year 2023 as planned, and could complete most of our planned research at the end of the grant lifespan in the fiscal year 2023.
|
今後の研究の推進方策 |
As the grant's term ended in fiscal year 2023, we are now in the process of preparing to apply for a new grant for fiscal year 2025, based on the fact that this research has unveiled new paths for further exploration within the realm of string regularities and compressed indexes, igniting our enthusiasm to pursue these paths in the forthcoming years. While our main attention has been set to text indexing data structures for classic pattern matching, the exploration of extended pattern matching queries remains largely undone. In response, we aim to expand upon several concepts discovered during our recent research, combining them with cutting-edge indexing techniques tailored for classic pattern matching. We anticipate that these innovative indexing methodologies will find practical applications in scenarios where conventional pattern matching proves too restrictive, necessitating more adaptable matching criteria.
|