2019 Fiscal Year Annual Research Report

Data Mining for Graphs and Networks via Local Intrinsic Dimensional Modeling

Research Project

Project/Area Number	18H03296
Research Institution	National Institute of Informatics
Principal Investigator	Michael E.Houle 国立情報学研究所, 大学共同利用機関等の部局等, 客員教授 (90399270)
Project Period (FY)	2018-04-01 – 2021-03-31
Keywords	高次元空間 / 極値理論 / データマイニング / 機械学習 / ニューラルネットワーク
Outline of Annual Research Achievements	1. Publication of a refereed paper at the top-tier ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2019). This paper presented a technique for generating more realistic synthetic neighbors for the purpose of explainability of learned solutions in deep learning. The improvements to explainability were due to the use of the local intrinsic dimensionality (LID) model central to this project. (Acceptance rate: 14%) 2. Publication of a refereed paper at the top-tier SIAM International Conference on Data Mining (SDM 2019). This paper proposed an estimator for LID that makes use of all pairwise distances within a local neighborhood sample. This use of full similarity neighborhoods allows signficantly higher quality of estimation within more tightly-focused neighborhoods. With this estimator, we are targeting applications in anomaly and outlier detection in data mining, among others. (Acceptance rate: 23%.) 3. 2 other refereed publications at international venues: (SISAP 2019 conference) the use of decompositions of LID to determine relevant local data subspaces; (IJAIT journal) an empirical analysis of the NN-descent similarity search method, in terms of local neighborhood imbalance (the hubness phenomenon, related to extreme LID scores). 4. 2 unrefereed publications (arXiv): a full version of the SISAP 2019 paper; and work in progress on the use of an LID-based regularization for improving the quality of GAN-based deep learning.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason The KDD 2019 paper on explainability of deep learning solutions, and the arXiv paper on LID-based regularization for improving the quality of GAN-based deep learning, are both important examples of how LID can be used to both guide and explain high-quality solutions within neural network-based learning in general, and deep learning in particular. These, as well as the SDM 2019 paper on high-quality tight LID estimation, are all evidence of continued impact of the LID model at the highest levels of the field. As further evidence, LID is beginning to be taken up as a modeling tool by researchers unconnected to this project. Current Google Scholar citation counts of the top international publications arising from this project are now 170 (ICLR 2018) and 82 (ICML 2018). The LID model was highlighted at the SISAP 2019 conference, where the best paper award went to work done by researchers unassociated with this project - the topic of their paper being the impact of LID in predicting the quality (performance vs accuracy) of query results. The goal of establishing our research results within the community is being met. recent reviews for the top-tier IJCAI 2019 international conference, 3 out of the 7 submissions I was assigned to review have independently adopted LID for excellent effect in deep neural network (DNN) applications. All this indicates that one of the most important objectives of the project, to establish LID as an essential, standard model for DNN, has already been met. For the remainder of the project, we will seek to build upon this already very satisfying start.
Strategy for Future Research Activity	The COVID-19 crisis has presented difficulties for all aspects of this project, due to its reliance on collaboration between researchers at institutions in 5 different countries. However, taking this into account, our objectives for FY2020 are: (1) In FY 2019 we developed estimators for a decomposition of LID. Our initial work showed that these estimators can help determine low-dimensional yet discriminative feature subsets. This year, we will further develop the LID model and derive techniques from it, to help identify impactful local features for subspace applications in machine learning and data mining. (2) In FY2018 and FY2019, we verified the use of LID for applications in deep neural network learning, including classification and adversarial detection, with a particular emphasis on generalized adversarial networks. In FY2020, we will further develop our LID-based regularization techniques for GANs. (3) In FY2019, we developed new, sharp, tight estimators for LID. In FY2020, we make use of these estimators in the implementation of a new LID-based theoretical model for outlier detection, whose ongoing development began late last year. Initial results look extremely promising. (4) We had postponed our plan for a third NII Shonan Meeting on Dimensionality and Scalability until 2020. We will now look to hold this in February or March 2021 if possible, as a wrapup to the project and as a means to explore further outcomes from this line of work.

Research Products
(11 results)

All 2019 Other

All Int'l Joint Research (5 results) Journal Article (6 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 4 results)

[Int'l Joint Research] University of Melbourne(オーストラリア)
- Country Name
  AUSTRALIA
- Counterpart Institution
  University of Melbourne
[Int'l Joint Research] University of Southern Denmark(デンマーク)
- Country Name
  DENMARK
- Counterpart Institution
  University of Southern Denmark
[Int'l Joint Research] CNRS / IRISA Rennes/INRIA / IRISA Rennes(フランス)
- Country Name
  FRANCE
- Counterpart Institution
  CNRS / IRISA Rennes/INRIA / IRISA Rennes
[Int'l Joint Research] University of Novi Sad(セルビア)
- Country Name
  SERBIA
- Counterpart Institution
  University of Novi Sad
[Int'l Joint Research] New Jersey Institute of Technology(米国)
- Country Name
  U.S.A.
- Counterpart Institution
  New Jersey Institute of Technology
[Journal Article] Intrinsic Dimensionality Estimation within Tight Localities2019
- Author(s)
  Amsaleg Laurent, Chelly Oussama, Houle Michael E., Kawarabayashi Ken-ichi, Radovanovic Milos, Treeratanajaru Weeris
- Journal Title
  
  SIAM International Conference on Data Mining (SDM 2019)
  
  Volume: 19 Pages: 181-189
- DOI
  https://doi.org/10.1137/1.9781611975673.21
- Peer Reviewed / Int'l Joint Research
[Journal Article] Improving the Quality of Explanations with Local Embedding Perturbations2019
- Author(s)
  Jia Yunzhe, Bailey James, Ramamohanarao Kotagiri, Leckie Christopher、Houle Michael E.
- Journal Title
  
  ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
  
  Volume: 25 Pages: 875-884
- DOI
  https://doi.org/10.1145/3292500.3330930
- Peer Reviewed / Int'l Joint Research
[Journal Article] Subspace Determination Through Local Intrinsic Dimensional Decomposition2019
- Author(s)
  Becker Ruben, Hafnaoui Imane, Houle Michael E., Li Pan, Zimek Arthur
- Journal Title
  
  International Conference on Similarity Search and Applications (SISAP 2019)
  
  Volume: 12 Pages: 281-289
- DOI
  https://doi.org/10.1007/978-3-030-32047-8_25
- Peer Reviewed / Int'l Joint Research
[Journal Article] The Influence of Hubness on NN-Descent2019
- Author(s)
  Bratic Brankica, Houle Michael E., Kurbalija Vladimir, Oria Vincent, Radovanovic Milos
- Journal Title
  
  International Journal on Artificial Intelligence Tools
  
  Volume: 28 Pages: 1960002-1960002
- DOI
  https://doi.org/10.1142/S0218213019600029
- Peer Reviewed / Int'l Joint Research
[Journal Article] Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation2019
- Author(s)
  Becker Ruben, Hafnaoui Imane, Houle Michael E., Li Pan, Zimek Arthur
- Journal Title
  
  arXiv CoRR
  
  Volume: abs/1907.06771 Pages: 1-17
[Journal Article] Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality2019
- Author(s)
  Barua Sukarna, Ma Xingjun, Erfani Sarah M., Houle Michael E., Bailey James
- Journal Title
  
  arXiv CoRR
  
  Volume: abs/1905.00643 Pages: 1-31

2019 Fiscal Year Annual Research Report

Data Mining for Graphs and Networks via Local Intrinsic Dimensional Modeling

Principal Investigator

Michael E.Houle 国立情報学研究所, 大学共同利用機関等の部局等, 客員教授 (90399270)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] University of Melbourne(オーストラリア)

Country Name

Counterpart Institution

[Int'l Joint Research] University of Southern Denmark(デンマーク)

Country Name

Counterpart Institution

[Int'l Joint Research] CNRS / IRISA Rennes/INRIA / IRISA Rennes(フランス)

Country Name

Counterpart Institution

[Int'l Joint Research] University of Novi Sad(セルビア)

Country Name

Counterpart Institution

[Int'l Joint Research] New Jersey Institute of Technology(米国)

Country Name

Counterpart Institution

[Journal Article] Intrinsic Dimensionality Estimation within Tight Localities2019

Author(s)

Journal Title

DOI

[Journal Article] Improving the Quality of Explanations with Local Embedding Perturbations2019

Author(s)

Journal Title

DOI

[Journal Article] Subspace Determination Through Local Intrinsic Dimensional Decomposition2019

Author(s)

Journal Title

DOI

[Journal Article] The Influence of Hubness on NN-Descent2019

Author(s)

Journal Title

DOI

[Journal Article] Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation2019

Author(s)

Journal Title

[Journal Article] Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality2019

Author(s)

Journal Title