2020 Fiscal Year Final Research Report
String Indexing Based on Space-Optimal Grammar Compression and Its Application to Knowledge Discovery from Stream Data
Project/Area Number |
18K18111
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | Kyushu Institute of Technology |
Principal Investigator |
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | データ圧縮 / 圧縮索引 / 圧縮情報処理 / 文法圧縮 / BWT |
Outline of Final Research Achievements |
Highly repetitive texts exceed TB and are still increasing. In this research, we developed grammar compressions and Online Run-Length BWTs (ORLBWTs), which can compress such large streaming data at high speed in compressed space. Furthermore, we developed various information processes on the compressed data. Although we could not develop a grammar-based compressed index supporting real-time keyword searches on large streaming data, we significantly improved the construction time of ORLBWTs and our ORLBWTs resulted in the development of an ORLBWT-based compressed index supporting real-time searches on large streaming data [Bannai et al. TCS2020].
|
Free Research Field |
文字列のデータ圧縮とその圧縮データ上での情報検索
|
Academic Significance and Societal Importance of the Research Achievements |
開発した文法圧縮やOnline Run-Length BWT (ORLBWT)をTB超のデータをさらに省メモリかつ高速に圧縮可能になった.また,開発したORLBWTを応用したリアルタイムキーワード検索可能な圧縮索引を用いることで巨大なストリームデータから効率的に情報抽出可能となった.また,開発した各種圧縮情報処理技術を応用することで巨大なストリームデータからのリアルタイムの知識発見が可能とすることが期待できる.
|