Development of Constructive Induction Method of Useful Attributes from Complex Structured Data
Project/Area Number |
16300046
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Osaka University |
Principal Investigator |
MOTODA Hiroshi Osaka University, Institute of Scientific and Industrial Research, Professor, 産業科学研究所, 教授 (00283804)
|
Co-Investigator(Kenkyū-buntansha) |
WASHIO Takashi Osaka University, Institute of Scientific and Industrial Research, Associate Professor, 産業科学研究所, 助教授 (00192815)
YOSHIDA Tetsuya Osaka University, Graduate School of Information Science, Associate Professor, 大学院・情報科学研究科, 助教授 (80294164)
OHARA Kouzou Osaka University, Institute of Scientific and Industrial Research, Research Associate, 産業科学研究所, 助手 (30294127)
|
Project Period (FY) |
2004 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥13,800,000 (Direct Cost: ¥13,800,000)
Fiscal Year 2005: ¥7,100,000 (Direct Cost: ¥7,100,000)
Fiscal Year 2004: ¥6,700,000 (Direct Cost: ¥6,700,000)
|
Keywords | Machine Learning / Knowledge Discovery / Data Mining / Clustering / Graph Mining / Time Series Analysis / Feature Construction |
Research Abstract |
In data mining where a set of useful knowledge is to be mined from a huge amount of data, the standard practice is to use the original attribute which is used in the original data representation. However, it often happens that the original attributes are not expressive enough and constructing new attributes from the original ones is inevitable. This is called feature construction and yet a better method is to be found. In this research a new feature construction method that is interleaved in the construction of a decision tree is developed and its performance is tested using both artificial and real world datasets. Since the forms of the data to handle become diversified and graph is a good way to represent data of general form, a graph mining method based on sequential chunking method is coupled with a decision tree construction method. The subgraph found at each decision node can be considered as a constructed attribute. The biggest problem of being unable to find overlapping pattern
… More
s by the straightforward chunking can be avoided by devising pseudo-chunking. The resulting CI-GBI (Chunkingless Graph-based Induction) is now able to do complete search by setting the values for the parameters appropriately. Since it does not use the notion of anti-monotonicity of subgraph subsumption, it can find subgraphs which other state-of-the-art approaches cannot find. Further, because it is guaranteed that the frequency counting of the found subgraphs is accurate, various indices that use frequency, e.g. information gain, are also evaluated accurately and CI-GBI becomes better suited as a feature construction component in decision tree construction. Subgraph search is called recursively during the tree construction and the best feature is constructed on the fly at each decision node. Compared with the straightforward chunking approach, the size of the constructed tree becomes much smaller and the predictive accuracy for an unseen instance becomes better. The application to the chronic hepatitis dataset indicated that it is indeed possible to predict the liver cirrhosis by blood test alone. Less
|
Report
(3 results)
Research Products
(90 results)