Autonomous Data Mining System based on Constructive Learning
Project/Area Number |
09680359
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Yokohama National University |
Principal Investigator |
SUZUKI Einoshin Yokohama National University, Faculty of Engineering, Associate Professor, 工学部, 助教授 (10251638)
|
Project Period (FY) |
1997 – 1998
|
Project Status |
Completed (Fiscal Year 1998)
|
Budget Amount *help |
¥3,300,000 (Direct Cost: ¥3,300,000)
Fiscal Year 1998: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 1997: ¥1,900,000 (Direct Cost: ¥1,900,000)
|
Keywords | rule discovery / machine learning / data mining / constructive induction / dynamic bias / data-driven |
Research Abstract |
This research presents an autonomous method for discovering prediction rules with dynamic bias selection. A prototype system has been developed, and its effectiveness was demonstrated by experiments. In current data mining systems, an user is involved in both pre-processing of a data set and knowledge discovery. In order to reduce his burden of choosing and adjusting multiple mining algorithms, we propose a knowledge discovery system, which autonomously selects learning methods based on constructive induction. Our task is prediction rule discovery. A prediction rule, which is aimed at predicting the class of an unseen example, deserves special attention due to its usefulness in various domains such as exploratory data analysis and automatic construction of a knowledge base. Our method consists of two phases : 1) pre-processing of a data set by autonomous discretization ; 2) knowledge discovery by autonomous decision of knowledge representation and autonomous adjustment of evaluation criteria. Our method, based on novel data-driven criteria and constraints, selects appropriate biases, each of which is a component of a learning algorithm. Available biases are an equal-frequency method and a minimum entropy method for discretization ; a conjunction rule and an M of N rule for knowledge representation ; J-measure and predictiveness for evaluation criterion. Our approach has been validated using 47 discovery tasks with real-world data sets such as retail sale data. We have discussed quantitative evaluation criteria for prediction rule discovery, and proposed J-measure with cross-validation. Our method) compared with the best combinations of biases, achieved more than 90% J-measure with cross-validation in 30 tasks. Careful analysis revealed that our approach is effective unless provided data set is extremely small. We have also assumed a large-scale data set, and developed a parallel system on multiple personal computers.
|
Report
(3 results)
Research Products
(6 results)