研究実績の概要 |
1. Publication of a refereed international conference full paper at the top-tier ACM SIGKDD Conference on Knowledge Discovery and Data Mining. This paper proposed, analyzed, and evaluated several efficient estimators for local intrinsic dimensionality (ID) based on Extreme Value Theory (EVT), and lays the practical foundation for the applications of ID to be investigated in this project. 2. Two refereed international full papers (one conference and one journal) on the topic of flexible aggregate similarity search (FANN). State-of-the-art solutions for FANN are given using dimensional testing, which employs estimates of ID at runtime to control the tradeoff of execution time versus query accuracy. 3. In a top refereed international journal, an empirical study of unsupervised outlier detection measures, datasets and methods. Although this work is not specific to ID, this experimental framework will be used in the evaluation of our ID-based outlier detection methods. 4. Submissions to international conferences and journals, including: foundational work on the theoretical connections among local ID, second-order EVT, and clustering and outlier detection; a new EVT estimator for local ID that greatly improves upon our own work in 1; a measure of the dependency of random variables based on local ID; an explanation of the adversarial perturbation effect for deep learning classification, which states that the vulnerability of the classifier to adversarial attack increases with the local ID. 5. Successful organization of the second NII Shonan Meeting on Dimensionality and Scalability.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
The project is progressing essentially as planned. There is some rearrangement of the schedule, for the two reasons mentioned below: 1. We have found that follow-on applications such as clustering and classification are very sensitive to the quality of the estimators of ID used. For this reason, we have accelerated work on sharper ID estimation using local pairwise distances, with excellent results (currently under submission to a top conference). Some of the application work has been delayed so as to make use of these new estimators. 2. We have made what we feel is a tremendous discovery regarding the nature of adversarial attacks on classification. It has been known for the past several years that image classifiers are vulnerable to a form of subversion in which an adversary perturbs the image to be classified with an amount of noise that is imperceptible to human eyes, yet which causes the image to be misclassified as any class of the adversary's choosing. This has enormous implications for security, as (for example) an adversary could create a false passport in which the image appears to match his own face, but is recognized by a computer system as belonging to someone else. We have discovered that the vulnerability of nearest-neighbor classifiers is correlated with the local ID of the object in the domain space. The implication is that classifiers built on high-dimensional feature vectors are inherently vulnerable to this form of attack. As a result of this breakthrough, we have given higher priority to the issues of privacy and security in our investigations.
|
今後の研究の推進方策 |
Research plan for FY2016: 1. Further advancement of the theory of intrinsic dimensionality (ID), and development of improved estimators for local ID. Continuation of a study of the connections among ID, hubness, and shared neighbor similarity (SNN), using estimators of ID developed and published in FY2015. 2. Applications of the theory of local ID to the measure of dependency between random variables, and the effect of data perturbation on privacy preservation and adversarial effects on classification of data. For this work, we are adding as collaborators researchers and students from the groups of Prof. James Bailey from the University of Melbourne (Australia), and Prof. Vladimir Estivill-Castro from Griffith University (Australia). 3. The development of more efficient and more effective solutions for unsupervised applications of data mining and multimedia, by redesigning traditional methods for data clustering, anomaly detection, feature selection, and variants of similarity search, in light of the theoretical models. These include DBSCAN clustering and the LOF outlier detection paradigm. This work will continue into FY2017. 4. The creation and promotion of a new interdisciplinary international research community. We will build upon the successful NII Shonan Meeting on Dimensionality and Scalability II held at Hayama, Japan, by organizing the 9th International Conference on Similarity Search and Applications (SISAP 2016) in Tokyo.
|