Project/Area Number |
17002008
|
Research Category |
Grant-in-Aid for Specially Promoted Research
|
Allocation Type | Single-year Grants |
Review Section |
Science and Engineering
Engineering
|
Research Institution | Hokkaido University |
Principal Investigator |
ARIMURA Hiroki Hokkaido University, Grad. School of IST., Prof. (20222763)
|
Co-Investigator(Kenkyū-buntansha) |
KIDA Takuya Hokkaido Univ., Grad. School of IST., Assoc. Prof. (70343316)
MINATO Shin-ichi Hokkaido Univ., Grad School of IST., Assoc. Prof. (10374612)
ITO Kimihito Hokkaido Univ., Research Center for Zoonosis Control, Assoc. Prof. (60396314)
|
Project Period (FY) |
2005 – 2007
|
Project Status |
Completed (Fiscal Year 2007)
|
Budget Amount *help |
¥142,090,000 (Direct Cost: ¥109,300,000、Indirect Cost: ¥32,790,000)
Fiscal Year 2007: ¥9,490,000 (Direct Cost: ¥7,300,000、Indirect Cost: ¥2,190,000)
Fiscal Year 2006: ¥75,660,000 (Direct Cost: ¥58,200,000、Indirect Cost: ¥17,460,000)
Fiscal Year 2005: ¥56,940,000 (Direct Cost: ¥43,800,000、Indirect Cost: ¥13,140,000)
|
Keywords | knowledge infrastructure formation / semi-structured data / data mining / optimized pattern discovery / knowledge society / knowledge indexing / information extraction / sequence and tree mining / 知識索引構造 / 極太パターン発見 / グラフマイニング |
Research Abstract |
By rapid progress of network and storage technologies for the last decade, a huge amount of weakly structured electronic data of various types, called semi-structured data, become available over the Internet. In this research project, we study efficient semi-structured data mining technologies that supports human's discovery of useful knowledge from massive collections of semi-structured data. In particular, we develop high-speed semi-structured data mining engines as a core of large-scale knowledge Infrastructure formation from the Internet and establish their architecture and base technologies. In particular, we have studied the following research topics: 1. High-speed semi-structured data mining technologies. We developed efficient depth-first mining algorithms for sequences and trees based on the rightmost expansion technique, proposed by Hiroki Arimura and other researchers. We also devised various optimization techniques in frequent and maximal pattern mining frameworks. We made t
… More
heoretical analysis of their computational complexity, and show that they are actually polynomial delay and polynomial space complexity enumeration algorithms, which means that they can be used as light-weight and high-throughput mining engines. From theory point of view, they are the first maximal pattern mining algorithms for non-trivial subclasses of semi-structured data having theoretically guaranteed performance. 2. Knowledge indexing technologies. The pre- and post-processing of the data and patterns are important issues for the next-generation data mining. We developed an efficient knowledge indexing data structure, called VSOP, based on Zero-suppressed BDD (ZBDD) proposed by Shin-ichi Minato. VSOP is a compressed indexing structure for discrete structure, equipped with a rich set of algebraic manipulation operators for compressed transactions and itemsets. On the top of VSOP, we devised efficient data mining algorithms that can extract interesting structures hidden in a huge input dataset in an highly interactive way. 3. Knowledge federation technologies. We combine semi-structured data processing techniques with information retrieval and information extraction from the web and stream data. As results, we developed fast pattern matching algorithms for multi-dimensional numerical streams based on bit-parallel techniques and semi-automatic information extraction algorithms from the Web based on demonstration-by-example. We also implemented a set of knowledge discovery tools based on our algorithms and technologies devised in this project. We also made experiments to evaluate the prototype system on real world data on the Web and in Bioinformatics fields. Less
|