2018 Fiscal Year Annual Research Report
Data Mining for Graphs and Networks via Local Intrinsic Dimensional Modeling
Project/Area Number |
18H03296
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Michael E.Houle 国立情報学研究所, 大学共同利用機関等の部局等, 客員教授 (90399270)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | 高次元空間 / 極値理論 / データマイニング / 機械学習 / ニューラルネットワーク |
Outline of Annual Research Achievements |
1. Publication of a refereed paper at the top-tier International Conference on Learning Representations (ICLR 2018). This paper presented a characterization of corrupted examples in adversarial attack on classification systems, and a practical detection method, based on the local intrinsic dimensionality (LID) model central to this project. (Acceptance rate: 2.5%) 2. Publication of a refereed paper at the top-tier International Conference on Machine Learning (ICML 2018). This paper demonstrated that the progress of learning in deep neural network (DNN) classifiers is strongly correlated with a drop in LID at the deep feature level, and showed how to use this effect to prevent overtraining and overfitting to data. (World first in automatic detection and avoidance of overtraining during DNN learning.) 3. 4 other refereed publications at international conferences: (3 at SISAP 2018) correlation between LID and outlierness, the use of LID in accelerating the performance of data fingerprinting, and an adaptation of LID to model the local growth rate of search neighborhoods within graphs; (1 at WIMS 2018) examination of the effect of reverse neighborhood imbalance in similarity graph construction. 4. 1 refereed international top journal publication expanding on earlier work on LID estimation. 5. Two top-tier international conference publications accepted for presentation in FY 2019, on the topics of tight LID estimation and the use of LID in the generation of more realistic neighboring examples in explainability of DNN classification.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
The ICLR 2018 paper on adversarial characterization and detection, and the ICML 2018 paper on the characterization and prevention of overfitting, both use the local intrinsic dimensionality (LID) model for both theoretical explanation and practical management of deep learning classification. Both papers have made an impact at the highest levels of the field, and have introduced the LID model to the full international research community, much sooner than expected, and at a higher scale than expected. In its first 13 months since publication, the ICLR paper (2.5% acceptance rate) has been cited 65 times, and in its first 10 months since publication, the ICML paper has been cited 23 times. In my recent reviews for the top-tier IJCAI 2019 international conference, 3 out of the 7 submissions I was assigned to review have independently adopted LID for excellent effect in deep neural network (DNN) applications. All this indicates that one of the most important objectives of the project, to establish LID as an essential, standard model for DNN, has already been met. For the remainder of the project, we will seek to build upon this already very satisfying start.
|
Strategy for Future Research Activity |
Within FY2019: (1) In FY 2018 we developed estimators for a decomposition of LID, following theory published in 2017. Our initial work showed that these estimators can help determine low-dimensional yet discriminative feature subsets. This year, we will work on sharpening these estimators so as to help in feature ranking for subspace clustering applications. (2) In FY2018 we verified the use of LID for applications in deep neural network learning, including classification and adversarial detection. In FY2019, we will extend this work to other neural network applications, including generalized adversarial networks. (3) Work in FY2018 has revealed the importance of sharp, tight estimation of LID in application areas. In FY2019, we will establish new techniques for estimation in databases, data mining and multimedia settings where relatively few sample points may be used; examples of these will include outlier detection and recommender systems. (4) We will make technical innovations available to researchers and practitioners by integrating fundamental tools based on LID into practical systems. In FY2018, we had hoped to integrate an effective new estimator of LID into the ELKI data mining framework - this estimator is now fully designed, and we will propose it for ELKI this year. (5) We will further promote the interdisciplinary international research community by proposing a third NII Shonan Meeting on Dimensionality and Scalability for 2020. This meeting was anticipated for 2019, but we have decided to postpone it due to the circumstances of some of the prospective organizers.
|
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] NN-Descent on High-Dimensional Data2018
Author(s)
Brankica Bratic, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, Milos; Radovanovic
-
Journal Title
8th International Conference on Web Intelligence, Mining and Semantics (WIMS 2018)
Volume: 8
Pages: 20:1~20:8
DOI
Peer Reviewed / Int'l Joint Research
-