Spectral clustering for large document data using the reduced similarity matrix

Research Project

Project/Area Number	20500124
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Ibaraki University
Principal Investigator	SHINNOU HIROYUKI Ibaraki University, 工学部, 准教授 (10250987)
Project Period (FY)	2008 – 2010
Project Status	Completed (Fiscal Year 2010)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2010: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2009: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000) Fiscal Year 2008: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Keywords	縮約類似度行列 / スペクトラルクラスタリング / 文書クラスタリング / 距離学習 / 最大マージン化最近傍法 / 大規模データ / コミッティ / 名詞間距離 / シソーラス
Research Abstract	In this research, I proposed the spectral clustering method for large document data. First, large document data is divided into small clusters by k-means. then some reliable data are picked up each clusters. We construct a similarity matrix from these reliable data. This matrix is reduced, so we can use the spectral clustering for it. Furthermore, in order to improve the precision of clustering, I researched the distance measurement of two nouns, and distance learning for documents.

Report

(4 results)

2010 Annual Research Report Final Research Report ( PDF )
2009 Annual Research Report
2008 Annual Research Report

Research Products
(24 results)

All 2011 2010 2009 2008 Other

All Presentation (22 results) Remarks (2 results)

[Presentation] 距離学習に基づく語義識別の性能分析2011
- Author(s)
  佐々木稔,新納浩幸
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  豊橋(E2-7)
- Year and Date
  2011-03-11
- Related Report
  2010 Final Research Report
[Presentation] 教師付き外れ値検出による新語義の発見2011
- Author(s)
  新納浩幸, 佐々木稔
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  豊橋技術科学大学
- Year and Date
  2011-03-10
- Related Report
  2010 Annual Research Report
[Presentation] 距離学習に基づく語義識別の性能分析2011
- Author(s)
  佐々木稔, 新納浩幸
- Organizer
  言語処理学会第17回年次大会
- Place of Presentation
  豊橋技術科学大学
- Year and Date
  2011-03-09
- Related Report
  2010 Annual Research Report
[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010
- Author(s)
  Minoru Sasaki, Hiroyuki Shinnou
- Organizer
  The Fourth International Conference on Advances in Semantic Processing
- Place of Presentation
  フィレンツェ(イタリア)(91-95)
- Year and Date
  2010-10-25
- Related Report
  2010 Final Research Report
[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010
- Author(s)
  Minoru Sasaki, Hiroyuki Shinnou
- Organizer
  The Fourth International Conference on Advances in Semantic Processing
- Place of Presentation
  Novotel Firenze Nord Aeroportoホテル(フィレンツェ, イタリア)
- Year and Date
  2010-10-25
- Related Report
  2010 Annual Research Report
[Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  LREC-2010
- Place of Presentation
  バレッタ(マルタ共和国)
- Year and Date
  2010-05-20
- Related Report
  2010 Final Research Report
[Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  The seventh international conference on Language Resources and Evaluation
- Place of Presentation
  地中海会議センター(バレッタ, マルタ共和国)
- Year and Date
  2010-05-20
- Related Report
  2010 Annual Research Report
[Presentation] Webディレクトリを利用した意味的関連語集合の作成2010
- Author(s)
  佐々木稔, 三上健太, 新納浩幸
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-11
- Related Report
  2009 Annual Research Report
[Presentation] Webディレクトリを利用した名詞のジャンルベクトルの作成2010
- Author(s)
  林華, 新納浩幸, 佐々木稔
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-10
- Related Report
  2009 Annual Research Report
[Presentation] LOFとOne Class SVMを用いた特異用例の検出2010
- Author(s)
  新納浩幸, 佐々木稔
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-10
- Related Report
  2009 Annual Research Report
[Presentation] 名詞の主要語義の推定と語義識別への応用2010
- Author(s)
  江口晃, 新納浩幸, 佐々木稔
- Organizer
  言語処理学会第16回年次大会
- Place of Presentation
  東京大学
- Year and Date
  2010-03-10
- Related Report
  2009 Annual Research Report
[Presentation] 文書クラスタリングを対象としたWeighted Kernel K-meansの初期値設定法2009
- Author(s)
  茂木哲矢,新納浩幸,佐々木稔
- Organizer
  言語処理学会第15回年次大会
- Place of Presentation
  鳥取(D4-5)
- Year and Date
  2009-03-05
- Related Report
  2010 Final Research Report
[Presentation] 用例間類似度測定のための属性重みの推定2009
- Author(s)
  新納浩幸, 佐々木稔
- Organizer
  言語処理学会第15回年次大会
- Place of Presentation
  鳥取大学
- Related Report
  2008 Annual Research Report
[Presentation] 商品説明文からの検索語に対する関連語抽出2009
- Author(s)
  久保田敦, 佐々木稔, 新納浩幸
- Organizer
  言語処理学会第15回年次大会
- Place of Presentation
  鳥取大学
- Related Report
  2008 Annual Research Report
[Presentation] グラフクラスタリングによる単語用例クラスタリング2009
- Author(s)
  相原功昌, 佐々木稔, 新納浩幸
- Organizer
  言語処理学会第15回年次大会
- Place of Presentation
  鳥取大学
- Related Report
  2008 Annual Research Report
[Presentation] 文書クラスタリングを対象としたWeighted Kernel K-meansの初期値設定法2009
- Author(s)
  茂木哲矢, 新納浩幸, 佐々木稔
- Organizer
  言語処理学会第15回年次大会
- Place of Presentation
  鳥取大学
- Related Report
  2008 Annual Research Report
[Presentation] 類似性の不明なデータを手がかりとして与えるクラスタリング手法2009
- Author(s)
  佐々木稔, 松本良太, 新納浩幸
- Organizer
  DEIMフォーラム2009
- Place of Presentation
  静岡県嬬恋リゾート
- Related Report
  2008 Annual Research Report
[Presentation] Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size2008
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  LREC-2008
- Place of Presentation
  マラケッシュ(モロッコ)
- Year and Date
  2008-05-28
- Related Report
  2010 Final Research Report
[Presentation] Ping-pong Document Clustering using NMF and Linkage-Based Refinement2008
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  LREC-2008
- Place of Presentation
  マラケッシュ(モロッコ)
- Year and Date
  2008-05-28
- Related Report
  2010 Final Research Report
[Presentation] Ping-pong Document Clustering using NMF and Linkage-Based Refinement2008
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  Language Resources and Evaluation (LREC) 2008
- Place of Presentation
  マラケッシュ(モロッコ)
- Related Report
  2008 Annual Research Report
[Presentation] Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size2008
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  Language Resources and Evaluation (LREC) 2008
- Place of Presentation
  マラケッシュ(モロッコ)
- Related Report
  2008 Annual Research Report
[Presentation] Division of Example Sentences Based on the Meaning of a Target Word Using Semi-supervised Clustering2008
- Author(s)
  Hiroyuki Shinnou, Minoru Sasaki
- Organizer
  Language Resources and Evaluation (LREC) 2008
- Place of Presentation
  マラケッシュ(モロッコ)
- Related Report
  2008 Annual Research Report
[Remarks]
- URL
  http://info.ibaraki.ac.jp/script/websearch/index.htm
- Related Report
  2009 Annual Research Report
[Remarks]
- URL
  http://info.ibaraki.ac.jp/scripts/websearch/index.htm
- Related Report
  2008 Annual Research Report

Spectral clustering for large document data using the reduced similarity matrix

Principal Investigator

SHINNOU HIROYUKI Ibaraki University, 工学部, 准教授 (10250987)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Presentation] 距離学習に基づく語義識別の性能分析2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 教師付き外れ値検出による新語義の発見2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 距離学習に基づく語義識別の性能分析2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Document Clustering Using Semantic Relationship Between Target Documents And Related Documents2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Detection of Peculiar Examples using LOF and One Class SVM2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Webディレクトリを利用した意味的関連語集合の作成2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Webディレクトリを利用した名詞のジャンルベクトルの作成2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] LOFとOne Class SVMを用いた特異用例の検出2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 名詞の主要語義の推定と語義識別への応用2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 文書クラスタリングを対象としたWeighted Kernel K-meansの初期値設定法2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 用例間類似度測定のための属性重みの推定2009

Author(s)