2005 Fiscal Year Final Research Report Summary
New Approaches for large scale classification problems
Project/Area Number |
16510106
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Social systems engineering/Safety system
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
YAJIMA Yasutoshi Tokyo Institute of Technology, Graduate School of Decision Science and Technology, Associate Professor, 大学院・社会理工学研究科, 助教授 (80231645)
|
Project Period (FY) |
2004 – 2005
|
Keywords | Classification / Cutting Planes / Supervised learning / Kernel matrix / Data mining |
Research Abstract |
We propose an SVM based feature ranking and selection method. In the proposed procedure, several properties of SVMs with RBF kernel functions are exploited to calculate the vectors lying on the discriminate boundary. We show that these vectors, or their gradient vectors, can be calculated efficiently only by the elementary matrix and vector calculation. The results of numerical experiments on the Reuter-21578 dataset show that the proposed method achieves higher classification performance than that based on LSI and $chi^2$ statistics values. Also, we have introduced semi-supervised learning approaches for partially labeled data points. Our approaches utilize the manifold structure of the given data points, which is characterized as a weighted graph or the associated Laplacian. We show that a number of conventional SVM frameworks such as the 1-norm and 2-norm soft margin formulations and hard margin formulation can be naturally extended to the semi-supervised settings. The resulting formulations are quite simple convex quadratic programming problems. The sparse structure of the graph Laplacian enables us to optimize the problem in a practical amount of computational time even if the number of the variables, i.e., the number of the data points, is very large. Moreover, we show that several existing Laplacian based approaches can be seen as special classes of our framework. The numerical experiments indicate that our approaches perform well on some data sets. Our future plans include experiments on much larger data sets.
|
Research Products
(10 results)