2005 Fiscal Year Final Research Report Summary
Development of efficient machine discovery system based on data compression and pattern matching
Project/Area Number |
15300049
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | KYUSHU UNIVERSITY |
Principal Investigator |
TAKEDA Masayuki Kyushu Univ., Graduate School of Information Science and Electrical, Professor, 大学院・システム情報科学研究院, 教授 (50216909)
|
Co-Investigator(Kenkyū-buntansha) |
SHINOHARA Ayumi Tohoku Univ., Graduate School of Information Sciences, Professor, 大学院・情報科学研究科, 教授 (00226151)
SAKUMOTO Hiroshi Kyushu Institute of Technology, Department of Artificial Intelligence, Associate Professor, 情報工学部, 助教授 (50315123)
SUGIMOTO Noriko KYUSHU UNIVERSITY, Computing and Communications Center, Research Associate, 情報基盤センター, 助手 (80271120)
ISHINO Akira KYUSHU UNIVERSITY, Office for Information of University Evaluation, Research Associate, 大学評価情報室, 助手 (10315129)
NANRI Tomoko Doshisha Univ., Faculty of Culture and Information, Assistant professor, 文化情報学部, 講師 (50363388)
|
Project Period (FY) |
2003 – 2005
|
Keywords | Algorithms / Machine learning / Machine discovery / Pattern matching / Data compression / Semi-structured data / Pattern discovery |
Research Abstract |
We studied the following three items to build efficient machine discovery systems. (1)Text compression and pattern matching. We focused on grammar-based compression and develop efficient compression algorithms. Using them we addressed the compressed pattern matching problem and obtained efficient algorithms. (2)Time-efficient processing of text and semi-structured text. We developed text index structures to accelerate text processing. As index structures for substring pattern matching, suffix trees and DAWGs are well-known. We focus on CDAWG which is a hybrid structure of them, and devised an online linear-time construction algorithm for CDAWGs. We then devised a construction algorithm for CDAWGs with sliding windows, which has an application to text data compression. We also proposed a new index structure for large alphabets (such as Japanese texts), and proved its efficiency experimentally. On the other hand, we analyze the properties of subsequence automata which are index structures for subsequence pattern matching to accelerate subsequence pattern discovery. We successfully gave a solution to the problem of online linear-time construction of word suffix trees, which has been open over 10 years. We developed a fast tree pattern matching algorithm based on bit-parallel technique for efficient processing of semi-structured text data. (3)Pattern discovery and information extraction. We developed efficient pattern discovery algorithms for various classes of patterns. We implemented them and estimated their performances experimentally. We integrated the techniques developed into a knowledge discovery system, applied it to linguistic data and literary data and then obtained good results in corporation with linguists and literary scholars.
|
Research Products
(80 results)