Project/Area Number |
17K00301
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Shizuoka University (2018-2020) University of Yamanashi (2017) |
Principal Investigator |
|
Project Period (FY) |
2017-04-01 – 2021-03-31
|
Project Status |
Completed (Fiscal Year 2020)
|
Budget Amount *help |
¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)
Fiscal Year 2019: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Fiscal Year 2018: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Fiscal Year 2017: ¥520,000 (Direct Cost: ¥400,000、Indirect Cost: ¥120,000)
|
Keywords | ストリームデータ / オンラインアルゴリズム / 系列予測 / 頻出パターンマイニング / 頻出系列パターンマイニング / 非可逆圧縮 / 異常・変化検知 |
Outline of Final Research Achievements |
In this research, we developed a fast and memory-efficient algorithm for frequent sequential pattern mining from streaming data (FSP-SD). Streaming data analysis is a central issue in many domains. FSP-SD is one of the most fundamental tasks in streaming data analysis dealing with discrete structures. It exhibits two important issues; (1) the real time property to process a huge volume of transactions continuously arriving at high speed and simultaneously output the frequent sequences (FSs); and (2) memory efficiency to enumerate FSs while managing an exponential number of candidates with limited memory resource. We have addressed these two issues based on a novel technique, which is achieved by integrating approximation and compression. Our proposed algorithm and implementation, called PARASOL, is published in Journal of Intelligent Information Systems, and now available freely for academic. We also applied PARASOL to the event prediction problem.
|
Academic Significance and Societal Importance of the Research Achievements |
クラウドサービスやIoTの発展に伴い,多くのストリームデータが生み出されている.ストリームデータのインパクトはリアルタイム分析にあるが,他方,大量のデータを高速・省メモリで処理する必要がある.本研究で扱う問題は,組み合わせ爆発やリアルタイム性などオンライン処理を実現するストリームデータマイニングに共通する技術的制約や難しさを含んでおり重要な基礎問題に位置付けられる.本研究を通して,適用困難だった大規模データへのデータマイニング法の可用性が高められ、安価な計算資源でビッグデータの相関分析や時系列解析を行えるようになっている.
|