• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of clustering method for large and complex data and its theoretical properties

Research Project

Project/Area Number 20K19756
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 60030:Statistical science-related
Research InstitutionOsaka University

Principal Investigator

Terada Yoshikazu  大阪大学, 大学院基礎工学研究科, 准教授 (10738793)

Project Period (FY) 2020-04-01 – 2024-03-31
Project Status Completed (Fiscal Year 2023)
Budget Amount *help
¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)
Fiscal Year 2022: ¥520,000 (Direct Cost: ¥400,000、Indirect Cost: ¥120,000)
Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Keywordsクラスタリング / 高速化 / 教師なし学習 / 大規模データ / 計算量削減 / 漸近理論 / 動的計画法
Outline of Research at the Start

近年,データの大規模化・複雑化に伴い,教師なし分類問題の重要性が再認識されている.しかし,大規模なデータに対しては計算コストの低いk-means法のような単純な方法のみが適用されており,データの背後の分類構造を十分に捉えることができていない可能性がある.本研究では,複雑なクラスタ構造を捉えることが可能で,かつ,大規模なデータに対しても高速に実行可能なクラスタリング法を提案し,その理論的保証を与える.

Outline of Final Research Achievements

In this study, we developed a general computational cost reduction method for large-scale clustering and showed its theoretical properties. Additionally, we developed a fast algorithm for convex clustering that can flexibly capture hierarchical group structures. By using the proposed methods, it is possible to perform complex clustering techniques on over a million data points within one minute, even using a laptop. This enables the rapid estimation of underlying cluster structures in large and complex data.

Academic Significance and Societal Importance of the Research Achievements

近年のデータの大規模化・複雑化に伴い, データからグループ構造を発見するためのクラスタリング法の重要性が増している. しかし, これまで大規模データに対しては, 単純なクラスタ構造しか捉えられないクラスタリング法しか適用ができなかった. 本研究成果により, クラスタリング法を必要とする任意の分野において, 短時間かつ容易に, 大規模データから複雑なクラスタ構造を推定することが可能となった. 本研究を応用することで, 様々な応用分野において, 新たな知見の発見などが期待できる.

Report

(5 results)
  • 2023 Annual Research Report   Final Research Report ( PDF )
  • 2022 Research-status Report
  • 2021 Research-status Report
  • 2020 Research-status Report
  • Research Products

    (24 results)

All 2023 2022 2021 2020 Other

All Int'l Joint Research (3 results) Journal Article (8 results) (of which Int'l Joint Research: 1 results,  Peer Reviewed: 8 results,  Open Access: 5 results) Presentation (13 results) (of which Int'l Joint Research: 5 results,  Invited: 11 results)

  • [Int'l Joint Research] Erasmus University Rotterdam(オランダ)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] Erasmus University Rotterdam(オランダ)

    • Related Report
      2022 Research-status Report
  • [Int'l Joint Research] Erasmus University Rotterdam(オランダ)

    • Related Report
      2020 Research-status Report
  • [Journal Article] Sparse kernel k-means for high-dimensional data2023

    • Author(s)
      Guan Xin、Terada Yoshikazu
    • Journal Title

      Pattern Recognition

      Volume: 144 Pages: 109873-109873

    • DOI

      10.1016/j.patcog.2023.109873

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Selective inference after feature selection via multiscale bootstrap2022

    • Author(s)
      Terada Yoshikazu、Shimodaira Hidetoshi
    • Journal Title

      Annals of the Institute of Statistical Mathematics

      Volume: 75 Issue: 1 Pages: 99-125

    • DOI

      10.1007/s10463-022-00838-2

    • Related Report
      2022 Research-status Report
    • Peer Reviewed
  • [Journal Article] Sparse and Simple Structure Estimation via Prenet Penalization2022

    • Author(s)
      Hirose Kei、Terada Yoshikazu
    • Journal Title

      Psychometrika

      Volume: 1 Issue: 4 Pages: 1-26

    • DOI

      10.1007/s11336-022-09868-4

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Forecasting temporal variation of aftershocks immediately after a main shock using Gaussian process regression2021

    • Author(s)
      Morikawa, K., H. Nagao, S. Ito, Y. Terada, S. Sakai, and N. Hirata
    • Journal Title

      Geophysical Journal International

      Volume: - Issue: 2 Pages: 1018-1035

    • DOI

      10.1093/gji/ggab124

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Dynamic visualization for L1 fusion convex clustering in near-linear time2021

    • Author(s)
      Bingyuan Zhang, Jie Chen, Yoshikazu Terada
    • Journal Title

      Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence

      Volume: 161 Pages: 515-524

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Statistical analysis of sparse approximate factor models2020

    • Author(s)
      Poignard Benjamin、Terada Yoshikazu
    • Journal Title

      Electronic Journal of Statistics

      Volume: 14 Issue: 2 Pages: 3315-3365

    • DOI

      10.1214/20-ejs1745

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Classification from only positive and unlabeled functional data2020

    • Author(s)
      Terada Yoshikazu、Ogasawara Issei、Nakata Ken
    • Journal Title

      The Annals of Applied Statistics

      Volume: 14 Issue: 4 Pages: 1724-1742

    • DOI

      10.1214/20-aoas1404

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Fast generalization error bound of deep learning without scale invariance of activation functions2020

    • Author(s)
      Terada Yoshikazu、Hirose Ryoma
    • Journal Title

      Neural Networks

      Volume: 129 Pages: 344-358

    • DOI

      10.1016/j.neunet.2020.05.033

    • Related Report
      2020 Research-status Report
    • Peer Reviewed
  • [Presentation] A statistical theory of clustering2023

    • Author(s)
      Yoshikazu Terada
    • Organizer
      Forum "Math-for-Industry" (FMfI) 2023
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] On some properties of reconstructed trajectories from sparse longitudinal data2023

    • Author(s)
      Yoshikazu Terada
    • Organizer
      The 15th Scientific Meeting of the Classification and Data Analysis Group
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] On smoothing for spatial functional data2023

    • Author(s)
      Yoshikazu Terada, Hidetoshi Matsui
    • Organizer
      The 6th International Conference on Econometrics and Statistics
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Dynamic prediction for variable-domain functional data2023

    • Author(s)
      Yoshikazu Terada, Hidetoshi Matsui
    • Organizer
      The 12th Conference of the IASC-ARS (IASC-ARS2023)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] クラスタリング法の統計理論と応用2023

    • Author(s)
      寺田吉壱
    • Organizer
      第43回情報計測オンラインセミナー
    • Related Report
      2023 Annual Research Report
    • Invited
  • [Presentation] ベクトル量子化による大規模クラスタリングの近似法とその性質2022

    • Author(s)
      寺田吉壱, 山本倫生
    • Organizer
      科研費シンポジウム「データサイエンスと周辺領域の双方向的理解への挑戦」
    • Related Report
      2022 Research-status Report
    • Invited
  • [Presentation] 代表点を用いた大規模クラスタリングの近似法とその性質2022

    • Author(s)
      寺田吉壱, 山本倫生
    • Organizer
      科研費シンポジウム「大規模複雑データの理論と方法論~新たな発展と関連分野への応用~」
    • Related Report
      2022 Research-status Report
    • Invited
  • [Presentation] On weak convergence of recovered functional data2022

    • Author(s)
      Yoshikazu Terada, Masaki Sasaki
    • Organizer
      15th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2022)
    • Related Report
      2022 Research-status Report
    • Invited
  • [Presentation] Regularized functional subspace clustering2022

    • Author(s)
      Yoshikazu Terada, Michio Yamamoto
    • Organizer
      CSDA & EcoSta Workshop on Statistical Data Science (SDS 2022)
    • Related Report
      2022 Research-status Report
    • Invited
  • [Presentation] Fast Approximation for large-scale clustering2022

    • Author(s)
      Yoshikazu Terada, Michio Yamamoto
    • Organizer
      The 11th Conference of the IASC-ARS The Asian Regional Section of the International Association for Statistical Computing
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research / Invited
  • [Presentation] クラスタリングにおける汎用的な計算コスト削減法について2021

    • Author(s)
      寺田吉壱, 山本倫生
    • Organizer
      2021年度日本分類学会シンポジウム
    • Related Report
      2021 Research-status Report
  • [Presentation] クラスタリングにおける汎用的な計算コスト削減法について2020

    • Author(s)
      寺田吉壱、山本 倫生
    • Organizer
      2020年度統計関連学会連合大会
    • Related Report
      2020 Research-status Report
  • [Presentation] 大規模なクラスタリングにおける計算量削減法について2020

    • Author(s)
      寺田吉壱、山本 倫生
    • Organizer
      第5回 統計・機械学習若手シンポジウム
    • Related Report
      2020 Research-status Report
    • Invited

URL: 

Published: 2020-04-28   Modified: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi