Spectral clustering for large document data using the reduced similarity matrix
Project/Area Number |
20500124
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Ibaraki University |
Principal Investigator |
SHINNOU HIROYUKI Ibaraki University, 工学部, 准教授 (10250987)
|
Project Period (FY) |
2008 – 2010
|
Project Status |
Completed (Fiscal Year 2010)
|
Budget Amount *help |
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2010: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2009: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2008: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
|
Keywords | 縮約類似度行列 / スペクトラルクラスタリング / 文書クラスタリング / 距離学習 / 最大マージン化最近傍法 / 大規模データ / コミッティ / 名詞間距離 / シソーラス |
Research Abstract |
In this research, I proposed the spectral clustering method for large document data. First, large document data is divided into small clusters by k-means. then some reliable data are picked up each clusters. We construct a similarity matrix from these reliable data. This matrix is reduced, so we can use the spectral clustering for it. Furthermore, in order to improve the precision of clustering, I researched the distance measurement of two nouns, and distance learning for documents.
|
Report
(4 results)
Research Products
(24 results)