2019 Fiscal Year Annual Research Report
Data Mining for Graphs and Networks via Local Intrinsic Dimensional Modeling
Project/Area Number |
18H03296
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Michael E.Houle 国立情報学研究所, 大学共同利用機関等の部局等, 客員教授 (90399270)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | 高次元空間 / 極値理論 / データマイニング / 機械学習 / ニューラルネットワーク |
Outline of Annual Research Achievements |
1. Publication of a refereed paper at the top-tier ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2019). This paper presented a technique for generating more realistic synthetic neighbors for the purpose of explainability of learned solutions in deep learning. The improvements to explainability were due to the use of the local intrinsic dimensionality (LID) model central to this project. (Acceptance rate: 14%) 2. Publication of a refereed paper at the top-tier SIAM International Conference on Data Mining (SDM 2019). This paper proposed an estimator for LID that makes use of all pairwise distances within a local neighborhood sample. This use of full similarity neighborhoods allows signficantly higher quality of estimation within more tightly-focused neighborhoods. With this estimator, we are targeting applications in anomaly and outlier detection in data mining, among others. (Acceptance rate: 23%.) 3. 2 other refereed publications at international venues: (SISAP 2019 conference) the use of decompositions of LID to determine relevant local data subspaces; (IJAIT journal) an empirical analysis of the NN-descent similarity search method, in terms of local neighborhood imbalance (the hubness phenomenon, related to extreme LID scores). 4. 2 unrefereed publications (arXiv): a full version of the SISAP 2019 paper; and work in progress on the use of an LID-based regularization for improving the quality of GAN-based deep learning.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The KDD 2019 paper on explainability of deep learning solutions, and the arXiv paper on LID-based regularization for improving the quality of GAN-based deep learning, are both important examples of how LID can be used to both guide and explain high-quality solutions within neural network-based learning in general, and deep learning in particular. These, as well as the SDM 2019 paper on high-quality tight LID estimation, are all evidence of continued impact of the LID model at the highest levels of the field. As further evidence, LID is beginning to be taken up as a modeling tool by researchers unconnected to this project. Current Google Scholar citation counts of the top international publications arising from this project are now 170 (ICLR 2018) and 82 (ICML 2018). The LID model was highlighted at the SISAP 2019 conference, where the best paper award went to work done by researchers unassociated with this project - the topic of their paper being the impact of LID in predicting the quality (performance vs accuracy) of query results. The goal of establishing our research results within the community is being met. recent reviews for the top-tier IJCAI 2019 international conference, 3 out of the 7 submissions I was assigned to review have independently adopted LID for excellent effect in deep neural network (DNN) applications. All this indicates that one of the most important objectives of the project, to establish LID as an essential, standard model for DNN, has already been met. For the remainder of the project, we will seek to build upon this already very satisfying start.
|
Strategy for Future Research Activity |
The COVID-19 crisis has presented difficulties for all aspects of this project, due to its reliance on collaboration between researchers at institutions in 5 different countries. However, taking this into account, our objectives for FY2020 are: (1) In FY 2019 we developed estimators for a decomposition of LID. Our initial work showed that these estimators can help determine low-dimensional yet discriminative feature subsets. This year, we will further develop the LID model and derive techniques from it, to help identify impactful local features for subspace applications in machine learning and data mining. (2) In FY2018 and FY2019, we verified the use of LID for applications in deep neural network learning, including classification and adversarial detection, with a particular emphasis on generalized adversarial networks. In FY2020, we will further develop our LID-based regularization techniques for GANs. (3) In FY2019, we developed new, sharp, tight estimators for LID. In FY2020, we make use of these estimators in the implementation of a new LID-based theoretical model for outlier detection, whose ongoing development began late last year. Initial results look extremely promising. (4) We had postponed our plan for a third NII Shonan Meeting on Dimensionality and Scalability until 2020. We will now look to hold this in February or March 2021 if possible, as a wrapup to the project and as a means to explore further outcomes from this line of work.
|
-
-
-
-
-
-
-
-
-
[Journal Article] The Influence of Hubness on NN-Descent2019
Author(s)
Bratic Brankica, Houle Michael E., Kurbalija Vladimir, Oria Vincent, Radovanovic Milos
-
Journal Title
International Journal on Artificial Intelligence Tools
Volume: 28
Pages: 1960002-1960002
DOI
Peer Reviewed / Int'l Joint Research
-
-