Project/Area Number |
10558047
|
Research Category |
Grant-in-Aid for Scientific Research (B).
|
Allocation Type | Single-year Grants |
Section | 展開研究 |
Research Field |
計算機科学
|
Research Institution | KYUSHU UNIVERSITY |
Principal Investigator |
SHINOHARA Ayumi KYUSHU UNIVERSITY, Department of Informatics, Ass. Prof., 大学院・システム情報科学研究院, 助教授 (00226151)
|
Co-Investigator(Kenkyū-buntansha) |
SHIMOZONO Shinichi KYUSHU UNIVERSITY, Department of Artificial Intelligence, Ass. Prof., 情報工学部, 助教授 (70243988)
SAKAMOTO Hiroshi KYUSHU UNIVERSITY, Department of Informatics, Res. Ass., 大学院・システム情報科学研究院, 助手 (50315123)
TAKEDA Masayuki KYUSHU UNIVERSITY, Department of Informatics, Ass. Prof., 大学院・システム情報科学研究院, 助教授 (50216909)
ZEUGMANN Thomas (ZEUGMANN Tho) 九州大学, 大学院・システム情報科学研究科, 教授 (60264016)
|
Project Period (FY) |
1998 – 2000
|
Project Status |
Completed (Fiscal Year 2000)
|
Budget Amount *help |
¥10,400,000 (Direct Cost: ¥10,400,000)
Fiscal Year 2000: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 1999: ¥3,900,000 (Direct Cost: ¥3,900,000)
Fiscal Year 1998: ¥4,500,000 (Direct Cost: ¥4,500,000)
|
Keywords | Pattern matching / Data compression / Compressed pattern matching / Data mining / Machine learning / Subsequence automaton / Straight-line program / 文字列照合 / パタンマッチング / 全文検索 / 計算学習理論 / 圧縮パタン照合 / BPE圧縮 / Lempel-Ziv圧縮 |
Research Abstract |
From a theoretical point of view on compressed pattern matching, we introduced a unified frame work, called Collage System, for various dictionary-based data compression methods. We developed both Knuth-Morris-Pratt type and Boyer-Moore type pattern matching algorithms for Collage Systems. We adopted these algorithms for Byte-Pair-Encoding compression method, that yields the fastest compressed pattern matching algorithm in practice. Multiple pattern matching and approximate string matching were also successfully dealt with Collage Systems. We also applied the method for Sequitur, that is another hopeful a compression program, and verified its performance. Moreover, we studied an efficient fully compressed pattern matching for balanced straight-line programs, where not only text strings but also pattern strings are compressed. We also developed an online algorithm that constructs a subsequence automaton from given set of strings, that accepts all subsequences of any string in the set. The algorithm is the fastest, and we verified that it is quite useful to accelerate a knowledge discovery system. On the other hand, concerning with knowledge discovery from database, we studied on the learnability of transformation rules of trees from examples, and searching optimal association rules of words from large text databases. Journal of Discrete Algorithms, 1(1), 2000
|