A Clustering Algorithm based on Mutually Ranking

Research Project

Project/Area Number	16K00165
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Multimedia database
Research Institution	Tokyo Metropolitan College of Industrial Technology
Principal Investigator	Michihiro Kobayakawa 東京都立産業技術高等専門学校, ものづくり工学科, 教授 (00334582)
Project Period (FY)	2016-10-21 – 2020-03-31
Project Status	Completed (Fiscal Year 2019)
Budget Amount *help	¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000) Fiscal Year 2018: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000) Fiscal Year 2017: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000) Fiscal Year 2016: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords	クラスタリング / 相互隣接グラフ / 相互ランキング / 類似度 / 非計量 / 検索 / ビックデータ分析・活用 / クラスタリング手法
Outline of Final Research Achievements	A clustering algorithm is a fundamental tool for analyzing data set. Most algorithms are used distance between the feature vectors described from each data. However, a feature vector is always extracted. In this case, we describe a set as a feature. If we use a feature based on a set, we can not use clustering algorithms using distance. Thus, we need new clustering algorithm for using both similarity and distance. A key idea of our clustering algorithm is to make mutually nearest neighbor graph (MNN-Graph). Our clustering algorithm consists of 5 steps; (1)Extract features from data set, (2)Make MNN-Graphwhich which regard data as vertexes, (3)Extract cliques in MNN-Graph, (4)Return to step (2) until a termination condition, (5)Combined similar sub-graph set,then output the result set as a cluster. We experimented on a set of document data.From experiments, we can say that accuracy of clustering was not so bad.
Academic Significance and Societal Importance of the Research Achievements	Society5.0を牽引するコア技術として、データ分析技術が必須となる。現在、AI等を活用したデータ分析技術が盛んに開発されている。しかし、データ分析における特徴量の選定により、分析すらできないことがある。本研究は、データ同士が相互に類似しているというシンプルな特徴を用いたクラスタリングアルゴリズムであり、クラスタリング生成の構造がシンプルである。したがって、データ間の類似の尺度が距離の公理を満たす・満たさないに関わらず適用可能である。精度・速度等が不足していることはあるが、汎用なクラスタリングアルゴリズムとして位置づけることができる。

Report

(5 results)

2019 Annual Research Report Final Research Report ( PDF )
2018 Research-status Report
2017 Research-status Report
2016 Research-status Report