2004 Fiscal Year Final Research Report Summary
Knowledge Discovery from Databases using Machine Learning and Data Envelopment Analysis and Its Application to Decision Support Systems
Project/Area Number |
13680460
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Aoyama Gakuin University |
Principal Investigator |
INAZUMI Hiroshige Aoyama Gakuin University, College of Science and Engineering, Professor, 理工学部, 教授 (00168402)
|
Project Period (FY) |
2001 – 2004
|
Keywords | Data envelop analysis / Machine Learning / Decision Tree / Knowledge discovery |
Research Abstract |
Gene expression data is one of the genome data which became available due to mapping. This data is collected by using DNA microarray. DNA microarray technology has now made it possible to monitor the expression levels of thousands of genes simultaneously. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genornics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Gene expression data is meaningful to cluster both genes and samples. The goal of clustering samples is to find the phenotype structures or substructures of the samples. The phenotypes of samples can be discriminat
… More
ed through only a small subset of genes whose expression levels strongly correlated with the class distinction. These genes are called "informative genes". Before clustering samples, it is essential to select these informative genes from the entire monitored genes. We propose a new clustering method using DEA (Data Envelopment Analysis). DEA solves optimization problems with multiple input/output models, which is commonly used to evaluate the efficiency of a number of Decision Making Units, DMUs, by comparing against a peer directly. We applied DEA to gene expression data using genes as DMUs. Selection of informative genes using DEA collects genes which have different expression patterns to each other. Then applied DEA using samples as DMUs, and clustered them according to DEA results. For example, we tested with the well known Leukemia data, 47 ALL samples and 25 AML samples. Selected informative genes had higher classification accuracy than the genes with high gain ratio, and discovered subclusters of given classes. Sample clustering can identify each cluster's representative sample and their characteristic points, which can be helpful to explain the clusters. We can conclude that DEA clustering has a high explanation capability. Our future work is to consider about combining DEA clustering with other clustering algorithms, and the application to time-series data Less
|
Research Products
(24 results)