2005 Fiscal Year Final Research Report Summary
A study of symbolic data analysis based on neighborhood graphs.
Project/Area Number |
16500089
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Tokyo Denki University |
Principal Investigator |
ICHINO Manabu Tokyo Denki Univ., Dept.of Inf.& Arts, Professor, 理工学部, 教授 (40057245)
|
Project Period (FY) |
2004 – 2005
|
Keywords | symbolic data / pattern recognition / feature selection / neighborhood graph / discrimination / correlation analysis / generalized correlation coefficient / geometrical thickness |
Research Abstract |
The purpose of this research is to develop new methods for Symbolic Data Analysis (SDA). The SDA is a new research field for generalized data table in which each object is described not only quantitative feature values but also qualitative feature values. The following is a summary of our research results. 1) Feature selection for classification problems When we have only finite number of training samples, the classification performance may not be improved by the addition of new features to describe the given training samples. This means that we have to strike the balance between the interclass distinguish-ability and the generality of class descriptions. We introduce the Cartesian System Model (CSM) as a mathematical model to treat symbolic data. Then, we define the notions of the inside view and the outside view based on the neighborhood relations. For a feature subset, the size of outside view and the size of inside view indicate the interclass distinction and the generality of class
… More
descriptions, respectively. Our interclass analysis is realized by combining a simple local feature selection method with the sizes of inside and outside views. We showed the usefulness of our approach by using data sets of UCI database. Since our method of interclass analysis is classifier independent, we can use it as a preprocessing process in the design of many pattern classifiers. 2) Generalized correlation coefficient Pearson's correlation coefficient is useful to detect linear causal relations. We need more general tools to treat nonlinear causal relations and wider covariant relations. If two feature variables follow to a functional structure, the scatter diagram of data samples indicates a geometrically thin structure. From this viewpoint, we developed the Calhoun correlation coefficient for two quantitative feature variables and a method based on the relative neighborhood relations of samples. In this study we found another method based on the chain connected covering (CCC). The CCC is able to treat general symbolic objects, and to detect monotonic structures embedded in symbolic data tables. This approach may be useful to generalize the PCA and clustering methods. Less
|
Research Products
(6 results)