Development of clustering method for large and complex data and its theoretical properties

Research Project

Project/Area Number	20K19756
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 60030:Statistical science-related
Research Institution	Osaka University
Principal Investigator	Terada Yoshikazu 大阪大学, 大学院基礎工学研究科, 准教授 (10738793)
Project Period (FY)	2020-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000) Fiscal Year 2022: ¥520,000 (Direct Cost: ¥400,000、Indirect Cost: ¥120,000) Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Keywords	クラスタリング / 高速化 / 教師なし学習 / 大規模データ / 計算量削減 / 漸近理論 / 動的計画法
Outline of Research at the Start	近年，データの大規模化・複雑化に伴い，教師なし分類問題の重要性が再認識されている．しかし，大規模なデータに対しては計算コストの低いk-means法のような単純な方法のみが適用されており，データの背後の分類構造を十分に捉えることができていない可能性がある．本研究では，複雑なクラスタ構造を捉えることが可能で，かつ，大規模なデータに対しても高速に実行可能なクラスタリング法を提案し，その理論的保証を与える．
Outline of Final Research Achievements	In this study, we developed a general computational cost reduction method for large-scale clustering and showed its theoretical properties. Additionally, we developed a fast algorithm for convex clustering that can flexibly capture hierarchical group structures. By using the proposed methods, it is possible to perform complex clustering techniques on over a million data points within one minute, even using a laptop. This enables the rapid estimation of underlying cluster structures in large and complex data.
Academic Significance and Societal Importance of the Research Achievements	近年のデータの大規模化・複雑化に伴い, データからグループ構造を発見するためのクラスタリング法の重要性が増している. しかし, これまで大規模データに対しては, 単純なクラスタ構造しか捉えられないクラスタリング法しか適用ができなかった. 本研究成果により, クラスタリング法を必要とする任意の分野において, 短時間かつ容易に, 大規模データから複雑なクラスタ構造を推定することが可能となった. 本研究を応用することで, 様々な応用分野において, 新たな知見の発見などが期待できる.

Report

(5 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Research-status Report
2021 Research-status Report
2020 Research-status Report

Research Products
(24 results)

All 2023 2022 2021 2020 Other

All Int'l Joint Research (3 results) Journal Article (8 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 8 results, Open Access: 5 results) Presentation (13 results) (of which Int'l Joint Research: 5 results, Invited: 11 results)

[Int'l Joint Research] Erasmus University Rotterdam(オランダ)
- Related Report
  2023 Annual Research Report
[Int'l Joint Research] Erasmus University Rotterdam(オランダ)
- Related Report
  2022 Research-status Report
[Int'l Joint Research] Erasmus University Rotterdam(オランダ)
- Related Report
  2020 Research-status Report
[Journal Article] Sparse kernel k-means for high-dimensional data2023
- Author(s)
  Guan Xin、Terada Yoshikazu
- Journal Title
  
  Pattern Recognition
  
  Volume: 144 Pages: 109873-109873
- DOI
  10.1016/j.patcog.2023.109873
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Selective inference after feature selection via multiscale bootstrap2022
- Author(s)
  Terada Yoshikazu、Shimodaira Hidetoshi
- Journal Title
  
  Annals of the Institute of Statistical Mathematics
  
  Volume: 75 Issue: 1 Pages: 99-125
- DOI
  10.1007/s10463-022-00838-2
- Related Report
  2022 Research-status Report
- Peer Reviewed
[Journal Article] Sparse and Simple Structure Estimation via Prenet Penalization2022
- Author(s)
  Hirose Kei、Terada Yoshikazu
- Journal Title
  
  Psychometrika
  
  Volume: 1 Issue: 4 Pages: 1-26
- DOI
  10.1007/s11336-022-09868-4
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Forecasting temporal variation of aftershocks immediately after a main shock using Gaussian process regression2021
- Author(s)
  Morikawa, K., H. Nagao, S. Ito, Y. Terada, S. Sakai, and N. Hirata
- Journal Title
  
  Geophysical Journal International
  
  Volume: - Issue: 2 Pages: 1018-1035
- DOI
  10.1093/gji/ggab124
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Dynamic visualization for L1 fusion convex clustering in near-linear time2021
- Author(s)
  Bingyuan Zhang, Jie Chen, Yoshikazu Terada
- Journal Title
  
  Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence
  
  Volume: 161 Pages: 515-524
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Statistical analysis of sparse approximate factor models2020
- Author(s)
  Poignard Benjamin、Terada Yoshikazu
- Journal Title
  
  Electronic Journal of Statistics
  
  Volume: 14 Issue: 2 Pages: 3315-3365
- DOI
  10.1214/20-ejs1745
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Classification from only positive and unlabeled functional data2020
- Author(s)
  Terada Yoshikazu、Ogasawara Issei、Nakata Ken
- Journal Title
  
  The Annals of Applied Statistics
  
  Volume: 14 Issue: 4 Pages: 1724-1742
- DOI
  10.1214/20-aoas1404
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Fast generalization error bound of deep learning without scale invariance of activation functions2020
- Author(s)
  Terada Yoshikazu、Hirose Ryoma
- Journal Title
  
  Neural Networks
  
  Volume: 129 Pages: 344-358
- DOI
  10.1016/j.neunet.2020.05.033
- Related Report
  2020 Research-status Report
- Peer Reviewed
[Presentation] A statistical theory of clustering2023
- Author(s)
  Yoshikazu Terada
- Organizer
  Forum "Math-for-Industry" (FMfI) 2023
- Related Report
  2023 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] On some properties of reconstructed trajectories from sparse longitudinal data2023
- Author(s)
  Yoshikazu Terada
- Organizer
  The 15th Scientific Meeting of the Classification and Data Analysis Group
- Related Report
  2023 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] On smoothing for spatial functional data2023
- Author(s)
  Yoshikazu Terada, Hidetoshi Matsui
- Organizer
  The 6th International Conference on Econometrics and Statistics
- Related Report
  2023 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Dynamic prediction for variable-domain functional data2023
- Author(s)
  Yoshikazu Terada, Hidetoshi Matsui
- Organizer
  The 12th Conference of the IASC-ARS (IASC-ARS2023)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] クラスタリング法の統計理論と応用2023
- Author(s)
  寺田吉壱
- Organizer
  第43回情報計測オンラインセミナー
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] ベクトル量子化による大規模クラスタリングの近似法とその性質2022
- Author(s)
  寺田吉壱, 山本倫生
- Organizer
  科研費シンポジウム「データサイエンスと周辺領域の双方向的理解への挑戦」
- Related Report
  2022 Research-status Report
- Invited
[Presentation] 代表点を用いた大規模クラスタリングの近似法とその性質2022
- Author(s)
  寺田吉壱, 山本倫生
- Organizer
  科研費シンポジウム「大規模複雑データの理論と方法論～新たな発展と関連分野への応用～」
- Related Report
  2022 Research-status Report
- Invited
[Presentation] On weak convergence of recovered functional data2022
- Author(s)
  Yoshikazu Terada, Masaki Sasaki
- Organizer
  15th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2022)
- Related Report
  2022 Research-status Report
- Invited
[Presentation] Regularized functional subspace clustering2022
- Author(s)
  Yoshikazu Terada, Michio Yamamoto
- Organizer
  CSDA & EcoSta Workshop on Statistical Data Science (SDS 2022)
- Related Report
  2022 Research-status Report
- Invited
[Presentation] Fast Approximation for large-scale clustering2022
- Author(s)
  Yoshikazu Terada, Michio Yamamoto
- Organizer
  The 11th Conference of the IASC-ARS The Asian Regional Section of the International Association for Statistical Computing
- Related Report
  2021 Research-status Report
- Int'l Joint Research / Invited
[Presentation] クラスタリングにおける汎用的な計算コスト削減法について2021
- Author(s)
  寺田吉壱, 山本倫生
- Organizer
  2021年度日本分類学会シンポジウム
- Related Report
  2021 Research-status Report
[Presentation] クラスタリングにおける汎用的な計算コスト削減法について2020
- Author(s)
  寺田吉壱、山本倫生
- Organizer
  2020年度統計関連学会連合大会
- Related Report
  2020 Research-status Report
[Presentation] 大規模なクラスタリングにおける計算量削減法について2020
- Author(s)
  寺田吉壱、山本倫生
- Organizer
  第5回統計・機械学習若手シンポジウム
- Related Report
  2020 Research-status Report
- Invited

Development of clustering method for large and complex data and its theoretical properties

Principal Investigator

Terada Yoshikazu 大阪大学, 大学院基礎工学研究科, 准教授 (10738793)

¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)

Report

Research Products

[Int'l Joint Research] Erasmus University Rotterdam(オランダ)

Related Report

[Int'l Joint Research] Erasmus University Rotterdam(オランダ)

Related Report

[Int'l Joint Research] Erasmus University Rotterdam(オランダ)

Related Report

[Journal Article] Sparse kernel k-means for high-dimensional data2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Selective inference after feature selection via multiscale bootstrap2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Sparse and Simple Structure Estimation via Prenet Penalization2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Forecasting temporal variation of aftershocks immediately after a main shock using Gaussian process regression2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Dynamic visualization for L1 fusion convex clustering in near-linear time2021

Author(s)

Journal Title

Related Report

[Journal Article] Statistical analysis of sparse approximate factor models2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Classification from only positive and unlabeled functional data2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Fast generalization error bound of deep learning without scale invariance of activation functions2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] A statistical theory of clustering2023

Author(s)

Organizer

Related Report

[Presentation] On some properties of reconstructed trajectories from sparse longitudinal data2023

Author(s)

Organizer

Related Report

[Presentation] On smoothing for spatial functional data2023

Author(s)

Organizer

Related Report

[Presentation] Dynamic prediction for variable-domain functional data2023

Author(s)

Organizer

Related Report

[Presentation] クラスタリング法の統計理論と応用2023

Author(s)

Organizer

Related Report

[Presentation] ベクトル量子化による大規模クラスタリングの近似法とその性質2022

Author(s)

Organizer

Related Report

[Presentation] 代表点を用いた大規模クラスタリングの近似法とその性質2022

Author(s)

Organizer

Related Report

[Presentation] On weak convergence of recovered functional data2022