2016 Fiscal Year Annual Research Report
Practical and Effective Data Mining Via Local Intrinsic Dimensional Modeling
Project/Area Number |
15H02753
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Michael E.Houle 国立情報学研究所, 大学共同利用機関等の部局等, 客員教授 (90399270)
|
Project Period (FY) |
2015-04-01 – 2018-03-31
|
Keywords | 高次元空間 / 極値理論 / データマイニング |
Outline of Annual Research Achievements |
1. Publication of a refereed international journal paper in Proceedings of the VLDB Endowment (PVLDB). This paper proposes an efficient method for the important yet difficult reverse k-nearest neighbor search problem. Our solution uses run-time optimization (including early termination) guided by a intrinsic dimensional testing criterion. The method significantly outperforms its competitors, particularly in that it can make use of existing LID estimators for autotuning its heuristic choices. The paper will also be presented at the associated top-tier international VLDB conference in August 2017. 2. One refereed international journal paper on the topic of similarity search within projected subspace, where the features identifying the subspace are supplied at query time. Here as well, dimensional testing is employed so as to accelerate performance. 3. One refereed international conference paper on the use of LID to measure dependency in data. The paper shows that the LID-based criterion can simultaneously identify multiple functional relationships in real data that conventional measures cannot handle. 4. Submissions to international conference and journals, including: foundational work on LID and EVT; improved EVT estimators for local ID; an explanation of the adversarial perturbation effect for deep learning classification; a theoretical analysis of the effect of projection on LID. These papers are now in submission, resubmission, or under revision. 5. Successful organization in Tokyo of the 9th International Conference on Similarity Search and Applications (SISAP 2016).
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The project is progressing essentially as planned. The work on applications of local ID to dependency measures was successful, and has been published at an international conference. Work on clustering and outlier detection resumed late in H28 with the arrival of a student dedicated to this topic. The application on reverse k-nearest neighbor search has appeared in a prestigious international journal (PVLDB). The performance of our tight estimators for local ID has exceeded expectations, but remains as yet unpublished. However, there have been some changes to the schedule, for the following reasons: 1. We have found that conference reviewers are less open to theoretical contributions than to applications, resulting in application papers being accepted to international conferences while the publication of underlying theory papers have been delayed. We are therefore planning to submit any pending and future theoretical work almost exclusively to international journals. 2. We have identified deep neural networks as a very important potential application of local intrinsic dimensional analysis. We have been reprioritizing our work accordingly. The related theoretical work on adversarial perturbation has been extended and is being prepared for resubmission. 3. The collaborators from Ludwig-Maximilians-Universitaet (LMU) in Munich, Germany have accepted positions in new institutions: Dr. Arthur Zimek has joined the University of Southern Denmark, and Dr. Erich Schubert is at the University of Heidelberg. Both will continue to collaborate in this project.
|
Strategy for Future Research Activity |
Research plan for FY2017: 1. Finalization of the development of improved estimators for local ID. Although we had achieved our performance targets for FY2016, we were recently able to achieve far greater improvements still, which will require that our experimentation be redone. 2. Finalization of the work on improving the performance of DBSCAN clustering and LOF outlier detection using local ID. 3. Continuation of foundational theoretical work on local ID and its relationship to feature selection. This work will likely continue very profitably past the end of this project, in both theoretical and applied directions. 4. Further investigation of how local ID can account for the performance of deep neural networks, and for the effect of adversarial perturbation. We have already seen how a "distillation" heuristic for deep learning can be reformulated in terms of multivariate local ID. This work is exploratory in nature, and may serve as the foundation of a large follow-on project starting in FY2018.
|