Budget Amount *help |
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2009: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2008: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
|
Research Abstract |
Our study achieved the extension of Rate-Distortion (RD) theoretically-principled learning method for practical and leading-edge problems in data mining and machine learning. One of our concrete achievement is formalizingRD learning for time series data described bymultivariate polynomial regression models and Markov chains. As a result, we developeda methodology for anomaly detection of dynamic systems. We validated our methodswith microarray time series data.We were able to detect the active state of the network with significantly higher precision and recall than the conventional methods.These results were published in KDD,which is one of the major conferences in Machine Learning/Data Minining. With respect to the time series data mining, we constructed benchmark datasets for different domains: including microarray data, financial time series, and robot trajectory. We developed an efficient instance-based method for online outlier detection method based on multi-perspective ensemble le
… More
arning. This results is presented at a Japanese workshop and submitted for an international conference. We also extended the RD formalization fortransfer learning problems, addressingmultiple, heterogeneous data sourcesand developed a methodology for regularized learning for unsupervised transfer learning.The main concrete result is the clustering of the heterogeneous text data, where significantly higher precision and recall was achieved in comparison to conventional methods.We showed further extension of the RD learningfor integrating geometric structures intoregularization framework. For validating theproposed approach, we prepared a benchmark data from bibliographical data annotated withco-author graph information.We applied ItGA, an information-theoreticGeo-topico analysis, and discovered better topics than popular PLSA and LDA methods.ItGA were significantly better as a dimensionality reduction method to extract important featuresof text. These results are published at ICDMand are in submission for other major Data Mining conferences. Less
|