Research on Concept Search and Visualization for Massive Japanese and English Documents

Research Project

Project/Area Number	16500057
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Media informatics/Database
Research Institution	Toyohashi University of Technology
Principal Investigator	MASAKI Aono Toyohashi University of Technology, Professor, 工学部, 教授 (00372540)
Project Period (FY)	2004 – 2006
Project Status	Completed (Fiscal Year 2006)
Budget Amount *help	¥3,600,000 (Direct Cost: ¥3,600,000) Fiscal Year 2006: ¥1,100,000 (Direct Cost: ¥1,100,000) Fiscal Year 2005: ¥1,100,000 (Direct Cost: ¥1,100,000) Fiscal Year 2004: ¥1,400,000 (Direct Cost: ¥1,400,000)
Keywords	Concept Search / Clustering / Ontology / Dimensional Reduction / Vector Space Model / Information Visualization / 検索質問拡張
Research Abstract	The main objective of this research has been on the leverage of vector space model for "concept search", i.e. to search "conceptually similar" documents given a query document either in Japanese or in English. During the past three years, we have developed two new methods that can be applied to concept search; the first one is based on our new data structure for representing a hierarchy of clusters for massive documents, and the second one is based on tuned categorization followed by category-wise dimensional reduction using latent semantic indexing (LSI). The most important advantage of vector space model over other document models is the independence of language once a document is transformed into a vector. For the first method developed in 2004, we repeatedly applied "co-clustering" algorithm to each sampled document collection to get word-document correlated clusters. We created a hierarchy of clusters by changing the sample document size by powers of two (specifically 16, 32, 64, 1 … More 28, 256, 512, 1024, and 2048) and applied "co-clustering" to each sampled collection. This method works well for the patent data because they have a tendency to have strong interrelationship between words and documents. For example, "pachinko" or "video game" only appears in sub-class "A63F" in IPC (International Patent Classification). We participated in NTCIR-5 patent task organized by NII and made a poster presentation at NTCIR-5 international workshop in December 2005 in Tokyo. The research paper on this method was submitted and accepted by AIRS 2005 international conference held in Jeju, Korea. Although this method explored a new method for automatically grouping the patent data into different level of granularity by utilizing hierarchal sampling and "co-clustering", it suffered from identifying minor clusters. For the second method developed in 2006, we first classified the entire patent collection into about 200 categories based on IPC, and then we applied LSI to each category repeatedly. This new method overcomes the difficulty in the first method in that it never relies on the "sampling" that inherits missing data samples, failing to identify minor clusters. This paper on this new method is currently submitted to NTCIR-6, international workshop to be held in Tokyo, May 2007. Search result visualization has been one of the objectives of this research. We have developed several visualization methods. They are divided into two categories; one category is to visualize clusters after summarizing search result, and other category is to use maps to help users understand geographical location if the search result document contains geographical information including prefecture, city, town, and village names. The concept search research also has led to "Semantic Web" applications. In particular, we have investigated new algorithms for "ontology" alignment, where "ontology" denotes shared conceptualization. One algorithm was submitted to international conference, and accepted by ASWC (Asia Semantic Web Conference) held in Beijing, 2006. Our method was competitive with the world-best-known algorithms so far in this research field. Less

Report

(4 results)

2006 Annual Research Report Final Research Report Summary
2005 Annual Research Report
2004 Annual Research Report

Research Products
(38 results)

All 2007 2006 2005 2004

All Journal Article (37 results) Book (1 results)

[Journal Article] SBMサービスのユーザとタグの情報に着目したWebページ推薦システム2007
- Author(s)
  杉山典之, 関洋平, 青野雅樹
- Journal Title
  
  情報処理学会第69回全国大会(早稲田大学) 4T-4
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Web書評を対象としたカテゴリー分析と読み手が受けた印象や感情の自動抽出2007
- Author(s)
  佐々木若菜, 関洋平, 青野雅樹
- Journal Title
  
  言語処理学会第13回年次大会
  
  Pages: 408-411
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Web上のニユース記事視覚化のための地名抽出手法2007
- Author(s)
  石田大和, 青野雅樹
- Journal Title
  
  電子情報通信学会・東海支部・Web公開http://www.takagi.i.is.nagoya-u.ac.jp/ieice/
  
  Pages: 1-1
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Web上のニュース記事視覚化のための地名抽出手法2007
- Author(s)
  石田大和, 青野雅樹
- Journal Title
  
  電子情報通信学会・東海支部・Web公開 http://www.takagi.i.is.nagoya-u.ac.jp/ieice
  
  Pages: 1-1
- Related Report
  2006 Annual Research Report
[Journal Article] Time Series Data Mining for Multimodal Bio-Signal Data2006
- Author(s)
  Masaki Aono, Y.Sekiguchi, et al.
- Journal Title
  
  International Journal of Computer Science and Network Security Vol6, No. 10
  
  Pages: 1-9
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Annual Research Report 2006 Final Research Report Summary
[Journal Article] Automatic Alignment of Ontology Eliminating the Probable Misalignments2006
- Author(s)
  Hanif Seddiqui, Y.Seki, Masaki Aono
- Journal Title
  
  The Semantic Web-ASWC 2006 (in a book 'Lecture Notes in Computer Science 4185' from Springer-Verlag)
  
  Pages: 212-218
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] A Method of Rating the Credibility of News Documents on the Web2006
- Author(s)
  Ryosuke Nagura, Y.Seki, Masaki Aono
- Journal Title
  
  Proc. ACM SIGIR (Special Interest Group on Information Retrieval) Vol. 29
  
  Pages: 683-684
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Annual Research Report 2006 Final Research Report Summary
[Journal Article] Exploring Overlapping Clusters using Dynamic Rescaling and Sampling2006
- Author(s)
  Mei Kobayashi, Masaki Aono
- Journal Title
  
  Knowledge and Information Systems (Springer-Verlag) Vol.10, No.3
  
  Pages: 295-313
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 独立成分分析を用いた3次元モデルの類似検索2006
- Author(s)
  立問淳司, 青野雅樹, 関洋平, 大渕竜太郎
- Journal Title
  
  情報処理学会第68回全国大会(工学院大学) 3M-5
- NAID
  170000171427
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Alignment of Ontology Constructing Similarity Matrices and Resolving the Amount of the Matrices2006
- Author(s)
  Hanif Seddiqui, Y.Seki, Masaki Aono
- Journal Title
  
  情報処理学会第68回全国大会(工学院大学) 4N-6
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Web上のニュース記事を対象とした信頼度の提案2006
- Author(s)
  奈倉良介, 関洋平, 青野雅樹
- Journal Title
  
  情報処理学会第68回全国大会(工学院大学) 3P-5
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 生体情報の時系列マイニングの試み2006
- Author(s)
  坂倉奨, 青野雅樹, 関洋平
- Journal Title
  
  情報処理学会第68回全国大会(工学院大学) 7P-2
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 多重フーリエスペクトル表現に基づく3次元モデルの形状類似検索2006
- Author(s)
  立間淳司, 関洋平, 青野雅樹
- Journal Title
  
  電子情報通信学会、Webインテリジェンスとインタラクション、IEICE SIG Notes, WI2-2006-83
  
  Pages: 89-94
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Annual Research Report 2006 Final Research Report Summary
[Journal Article] Time Series Data Mining for Multimodal Bio-Signal Data2006
- Author(s)
  Masaki Aono, Y.Sekiguchi, Y.Yasuda, N.Suzuki, Y.Seki
- Journal Title
  
  International Journal of Computer Science and Network Security Vol. 16,No.10
  
  Pages: 1-9
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Exploring overlapping clusters using dynamic re-scaling and sampling2006
- Author(s)
  Mei Kobayashi, Masaki Aono
- Journal Title
  
  Knowledge and Information Systems Vol.10, No.3
  
  Pages: 295-313
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Automatic Alignment of Ontology Eliminating the Probable Misalignments2006
- Author(s)
  Hanif Md.Seddiqui, Yohei Seki, Masaki Aono
- Journal Title
  
  The Semantic Web - ASWC 2006, Lecture Notes in Computer Science 4185(R. Mizoguchi et al. eds)(Springer)
  
  Pages: 212-218
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] A Method for Query Expansion Using a Hierarchy of Clusters2006
- Author(s)
  Masaki Aono, Hironori Doi
- Journal Title
  
  Information Retrieval Technology, Lecture Notes in Computer Science 3689(Gary G. Lee, et al. eds.)(Springer)
  
  Pages: 479-483
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Automatic Alignment of Ontology Eliminating the Probable Misalignments2006
- Author(s)
  Hanif Seddiqui, Y.Seki, Masaki Aono
- Journal Title
  
  The Semantic Web - ASWC 2006 (in a book 'Lecture Notes in Computer Science 4185' from Springer-Verlag
  
  Pages: 212-218
- Related Report
  2006 Annual Research Report
[Journal Article] Exploring Overlapping Clusters using Dynamic Rescaling and Sampling2006
- Author(s)
  Mei Kobayashi, Masaki Aono
- Journal Title
  
  Knowledge and Information Systems (Springer-Verlag) Vol. 10, No. 3
  
  Pages: 295-313
- Related Report
  2006 Annual Research Report
[Journal Article] 文書-単語双クラスタリングを用いた特許データの概念検索向上手法2005
- Author(s)
  青野雅樹, 土肥広典
- Journal Title
  
  DEWS2005,第16回データ工学ワークショップ
  
  Pages: 8-8
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary 2004 Annual Research Report
[Journal Article] RSSに基づく内容型情報推薦システムの提案2005
- Author(s)
  向井誠, 青野雅樹
- Journal Title
  
  情報処理学会第67回全国大会(電通大学) 2U-8
- NAID
  170000170447
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 個人の音楽嗜好データのOWLによる表現とその応用2005
- Author(s)
  武内祐一, 青野雅樹
- Journal Title
  
  情報処理学会第67回全国大会(電通大学) 3U-8
- NAID
  170000170455
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] DTDマッチングによる大学シラバスの相互変換2005
- Author(s)
  平野健太郎, 青野雅樹
- Journal Title
  
  情報処理学会第67回全国大会(電通大学) 4W-5
- NAID
  170000170477
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 双クラスタリングを用いた検索質問拡張手法2005
- Author(s)
  土肥広典, 青野雅樹
- Journal Title
  
  電子情報通信学会、Webインテリジェンスとインタラクション、IEICE SIG Notes, WI2-2005-18
  
  Pages: 43-48
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] クラスタ粒度階層構造を用いたアウトライヤー文書の検出方法2005
- Author(s)
  青野雅樹
- Journal Title
  
  信学技報 IEICE Technical Report DE2005-30 (2005-7)
  
  Pages: 1-6
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary 2005 Annual Research Report
[Journal Article] OWLを用いた音楽嗜好データ表現と音楽情報推薦ヘの応用2005
- Author(s)
  武内裕一, 青野雅樹
- Journal Title
  
  信学技報 IEICE Technical Report DE2005-66 (2005-7)
  
  Pages: 7-11
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 情報系科目を用いたHTMLシラバスのXML変換と内容分析2005
- Author(s)
  平野健太郎, 青野雅樹
- Journal Title
  
  電子情報通信学会、Webインテリジェンスとインタラクション、IEICE SIG Notes, WI2-2005-42
  
  Pages: 83-88
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] RSSに基づく個人向け内容型情報推薦プロトタイプシステム2005
- Author(s)
  向井誠, 青野雅樹
- Journal Title
  
  自然言語処理・情報学基礎合同研究会会報、2005-NL-169
  
  Pages: 27-32
- NAID
  110002952137
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary 2005 Annual Research Report
[Journal Article] A Method for Query Expansion Using a Hierarchy of Clusters2005
- Author(s)
  Nasaki Aono, Hironori Doi
- Journal Title
  
  AIRS 2005 (Asia Information Retrieval Symposium), in a book "Information Retrieval Technology", Lecture Notes in Computer Science 3689 (Gary G. Lee, et al. eds.) (Springer Verlag)
  
  Pages: 479-484
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] A Patent Retrieval Method Using a Hierarchy of Clusters at TUT2005
- Author(s)
  Hironori Doi, Yohei Seki, Masaki Aono
- Journal Title
  
  Proceedings of the Fifth NTCIR Workshop
  
  Pages: 287-291
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary 2005 Annual Research Report
[Journal Article] Webコミュニティマイニング2005
- Author(s)
  青野雅樹, 小林メイ
- Journal Title
  
  応用数理(岩波書店) Vol. 15, No. 1
  
  Pages: 53-57
- NAID
  110001888937
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] OWLを用いた音楽嗜好データ表現と音楽情報推薦への応用2005
- Author(s)
  武内裕一, 青野雅樹
- Journal Title
  
  信学技報 IEICE Technical Report DE2005-66 (2005-7)
  
  Pages: 7-11
- Related Report
  2005 Annual Research Report
[Journal Article] 情報系科目を用いたHTMLシラバスのXML変換と内容分析2005
- Author(s)
  平野健太郎, 青野雅樹
- Journal Title
  
  第3回 WebインテリジェンスとインタラクションWI2-2005-42
  
  Pages: 83-88
- Related Report
  2005 Annual Research Report
[Journal Article] A Method for Query Expansion Using a Hierarchy of Clusters2005
- Author(s)
  Masaki Aono, Hironori Doi
- Journal Title
  
  AIRS 2005 (Asia Information Retrieval Symposium), in a book : Information Retrieval Technology, Lecture Notes in Computer Science 3689 (Gary G.Lee, et al. eds.)(Springer Verlag)
  
  Pages: 479-484
- Related Report
  2005 Annual Research Report
[Journal Article] 双クラスタリングを用いた検索質問拡張手法2005
- Author(s)
  土肥広典, 青野雅樹
- Journal Title
  
  第2回 Webインテリジェンス研究会(電子情報通信学会)
  
  Pages: 43-48
- Related Report
  2004 Annual Research Report
[Journal Article] Webコミュニティマイニング2005
- Author(s)
  青野雅樹, 小林メイ
- Journal Title
  
  応用数理 Vol.15, No.1
  
  Pages: 53-57
- NAID
  110001888937
- Related Report
  2004 Annual Research Report
[Journal Article] Vector Space Models for Search and Cluster Mining2004
- Author(s)
  Mei Kobayashi, Masaki Aono
- Journal Title
  
  Survey of Text Mining(Michael W. Berry ed.)(Springer)
  
  Pages: 103-122
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Book] Survey of Text Mining (Chapter 5)2004
- Author(s)
  Mei Kobayashi, Masaki Aono
- Publisher
  Springer-Verlag
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary

Research on Concept Search and Visualization for Massive Japanese and English Documents

Principal Investigator

MASAKI Aono Toyohashi University of Technology, Professor, 工学部, 教授 (00372540)

¥3,600,000 (Direct Cost: ¥3,600,000)

Report

Research Products

[Journal Article] SBMサービスのユーザとタグの情報に着目したWebページ推薦システム2007

Author(s)

Journal Title

Description

Related Report

[Journal Article] Web書評を対象としたカテゴリー分析と読み手が受けた印象や感情の自動抽出2007

Author(s)

Journal Title

Description

Related Report

[Journal Article] Web上のニユース記事視覚化のための地名抽出手法2007

Author(s)

Journal Title

Description

Related Report

[Journal Article] Web上のニュース記事視覚化のための地名抽出手法2007

Author(s)

Journal Title

Related Report

[Journal Article] Time Series Data Mining for Multimodal Bio-Signal Data2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Automatic Alignment of Ontology Eliminating the Probable Misalignments2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] A Method of Rating the Credibility of News Documents on the Web2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Exploring Overlapping Clusters using Dynamic Rescaling and Sampling2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] 独立成分分析を用いた3次元モデルの類似検索2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Alignment of Ontology Constructing Similarity Matrices and Resolving the Amount of the Matrices2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Web上のニュース記事を対象とした信頼度の提案2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] 生体情報の時系列マイニングの試み2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] 多重フーリエスペクトル表現に基づく3次元モデルの形状類似検索2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Time Series Data Mining for Multimodal Bio-Signal Data2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Exploring overlapping clusters using dynamic re-scaling and sampling2006

Author(s)

Journal Title

Description