2009 Fiscal Year Final Research Report
A Study on Rate-distortion Theory-based Learning and its Application for Advanced Cluster Analyses
Project/Area Number |
20700132
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Single-year Grants |
Research Field |
Intelligent informatics
|
Research Institution | Gunma University |
Principal Investigator |
ANDO Shin Gunma University, 大学院・工学研究科, 助教 (70401685)
|
Project Period (FY) |
2008 – 2009
|
Keywords | 知識発見とデータマイニング |
Research Abstract |
Our study achieved the extension of Rate-Distortion (RD) theoretically-principled learning method for practical and leading-edge problems in data mining and machine learning. One of our concrete achievement is formalizingRD learning for time series data described bymultivariate polynomial regression models and Markov chains. As a result, we developeda methodology for anomaly detection of dynamic systems. We validated our methodswith microarray time series data.We were able to detect the active state of the network with significantly higher precision and recall than the conventional methods.These results were published in KDD,which is one of the major conferences in Machine Learning/Data Minining. With respect to the time series data mining, we constructed benchmark datasets for different domains: including microarray data, financial time series, and robot trajectory. We developed an efficient instance-based method for online outlier detection method based on multi-perspective ensemble le
… More
arning. This results is presented at a Japanese workshop and submitted for an international conference. We also extended the RD formalization fortransfer learning problems, addressingmultiple, heterogeneous data sourcesand developed a methodology for regularized learning for unsupervised transfer learning.The main concrete result is the clustering of the heterogeneous text data, where significantly higher precision and recall was achieved in comparison to conventional methods.We showed further extension of the RD learningfor integrating geometric structures intoregularization framework. For validating theproposed approach, we prepared a benchmark data from bibliographical data annotated withco-author graph information.We applied ItGA, an information-theoreticGeo-topico analysis, and discovered better topics than popular PLSA and LDA methods.ItGA were significantly better as a dimensionality reduction method to extract important featuresof text. These results are published at ICDMand are in submission for other major Data Mining conferences. Less
|