Knowledge Discovery from Databases using Machine Learning and Data Envelopment Analysis and Its Application to Decision Support Systems
Project/Area Number |
13680460
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Aoyama Gakuin University |
Principal Investigator |
INAZUMI Hiroshige Aoyama Gakuin University, College of Science and Engineering, Professor, 理工学部, 教授 (00168402)
|
Project Period (FY) |
2001 – 2004
|
Project Status |
Completed (Fiscal Year 2004)
|
Budget Amount *help |
¥3,200,000 (Direct Cost: ¥3,200,000)
Fiscal Year 2004: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2003: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2002: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2001: ¥800,000 (Direct Cost: ¥800,000)
|
Keywords | Data envelop analysis / Machine Learning / Decision Tree / Knowledge discovery / クラスタリング |
Research Abstract |
Gene expression data is one of the genome data which became available due to mapping. This data is collected by using DNA microarray. DNA microarray technology has now made it possible to monitor the expression levels of thousands of genes simultaneously. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genornics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Gene expression data is meaningful to cluster both genes and samples. The goal of clustering samples is to find the phenotype structures or substructures of the samples. The phenotypes of samples can be discriminat
… More
ed through only a small subset of genes whose expression levels strongly correlated with the class distinction. These genes are called "informative genes". Before clustering samples, it is essential to select these informative genes from the entire monitored genes. We propose a new clustering method using DEA (Data Envelopment Analysis). DEA solves optimization problems with multiple input/output models, which is commonly used to evaluate the efficiency of a number of Decision Making Units, DMUs, by comparing against a peer directly. We applied DEA to gene expression data using genes as DMUs. Selection of informative genes using DEA collects genes which have different expression patterns to each other. Then applied DEA using samples as DMUs, and clustered them according to DEA results. For example, we tested with the well known Leukemia data, 47 ALL samples and 25 AML samples. Selected informative genes had higher classification accuracy than the genes with high gain ratio, and discovered subclusters of given classes. Sample clustering can identify each cluster's representative sample and their characteristic points, which can be helpful to explain the clusters. We can conclude that DEA clustering has a high explanation capability. Our future work is to consider about combining DEA clustering with other clustering algorithms, and the application to time-series data Less
|
Report
(5 results)
Research Products
(43 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] マウス肝発ガン初期過程における遺伝子発現解析用Oligonucleotide Microarrayの開発2004
Author(s)
戸田香織, 原田基裕, 仲地豊, 近藤恭光, 中島圓, 浜田修一, 鈴木孝昌, 兵庫淳志, 星埜雅子, 田代英夫, 榊佳之, 伊藤尚, 稲積宏誠, 降旗千恵
-
Journal Title
第27回日本分子生物学会年会
Pages: 269-269
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-