A STUDY OF FEATURE (ATTRIBUTE) SELECTION IN DATA MINING
Project/Area Number |
12680398
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Tokyo Denki University |
Principal Investigator |
MANABU Ichino Tokyo Denki University, Department of Information and Arts, Professor., 理工学部, 教授 (40057245)
|
Project Period (FY) |
2000 – 2001
|
Project Status |
Completed (Fiscal Year 2001)
|
Budget Amount *help |
¥1,600,000 (Direct Cost: ¥1,600,000)
Fiscal Year 2001: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 2000: ¥1,100,000 (Direct Cost: ¥1,100,000)
|
Keywords | pattern recognition / data mining / feature selection / neighborhood graph / feature evaluation / geometrical thickness / analysis of variance / Calhoun correlation coefficient / 相関係数 / 0-1整数計画 / 手書き文字認識 / 分離能力 / 記述の一般性 |
Research Abstract |
The purpose of this research is to develop some methods of feature (attribute) selection in data mining. We report the results for feature selection in classification problems. Then, we report a new correlation coefficient which is applicable to various nonlinear relationships between feature variables. 1) Feature selection for classification problems When we have only a finite number of samples, the classification performance may not be improved by the addition of new features used to describe samples. This means that we have to strike a balance between the interclass distinguishability and the generality of class descriptions. We introduced two graphs: the generality or dered mutual neighborhood graph and the generality ordered interclass mutual neighborhood graph, then we dev eloped a feature selection algorithm based on the modified zero-one integer programmirig and it's simplified algorithm. 2) Generalized correlation coefficient Pearson's correlation coefficient is useful to detect causality between feature variables. However, this well known tool is not applicable to general nonlinear causal relations. If two feature variables follow to a functional structure, the sample distribution with respect to the feature variables has a geometrically thin structure. We developed a generalized correlation coefficient, called the Calhoun correlation coefficient. This new measure are able to evaluate various nonlinear functional relations and other geometrically this structures.
|
Report
(3 results)
Research Products
(13 results)