Budget Amount *help |
¥14,300,000 (Direct Cost: ¥14,300,000)
Fiscal Year 2005: ¥5,400,000 (Direct Cost: ¥5,400,000)
Fiscal Year 2004: ¥8,900,000 (Direct Cost: ¥8,900,000)
|
Research Abstract |
We have invented a method which summarizes essential parts of data with probabilistic clustering and allocates hues based on information criteria as a data mining method for multi-viewpoint and multi-granularity knowledge discovery. This method is an extension of our PrototypeLines, of which effectiveness has been demonstrated with medical test data. We have investigated the effectiveness of the method with Web Page data, which represent a typical text and image data, and have exhibited that our method is superior to Google in terms of recall, precision, and computational time. The method has been improved and extended to the final method, of which effectiveness has been evaluated quantitatively by applying it to Web page data and network intrusion data. Experiments with Web page data were performed for a task of grasping the content of a large number of Web pages from a visualization result on a sheet of A4 paper. Because of the style of asking many questions in a limited period, we ha
… More
ve adopted the number of correct answers of the subjects as the evaluation index, and our method has succeeded to increase the value of the index by 35 % compared with Google. Though specific routines for images and keywords are necessary, we consider that we have accomplished the initial objective of visualizing information with appropriate viewpoints and granularities for knowledge discovery. For the experiments using network intrusion data, we have chosen prediction problems from access log to Web pages. Excellent results have been obtained in terms of recall and precision for malicious access detection, discovery of peculiar fraudulent access, and comprehensiveness of visualization results. In the process, we have developed a multi-objective search method, an information evaluation index, and clustering methods for predicate logic data and have confirmed their effectiveness. In addition, we have developed visualization methods for transactional data of itemsets in cooperation with the University of Caen in France and obtained excellent results. Applications to various statio-temporal data, of which soccer data is representative, have been pursued and excellent results have been obtained in both visualization and knowledge discovery. Less
|