A Study on Term-Document Clustering based on a co-evolutionary framework

Research Project

Project/Area Number	13680473
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	National Institute of Informatics
Principal Investigator	AIZAWA Akiko National Institute of Informatics, Information Infrastructure Research Division, Associate Professor, 情報基盤研究系, 助教授 (90222447)
Project Period (FY)	2001 – 2002
Project Status	Completed (Fiscal Year 2002)
Budget Amount *help	¥4,100,000 (Direct Cost: ¥4,100,000) Fiscal Year 2002: ¥2,200,000 (Direct Cost: ¥2,200,000) Fiscal Year 2001: ¥1,900,000 (Direct Cost: ¥1,900,000)
Keywords	Information Retrieval / Dual Clustering / Automatic Text Categorization / Probability Weighted Information / Micro Clustering / Co-evolutionary Algorithms / Evolutionary Computation / NACSIS Academic Conference Paper Database
Research Abstract	In this study, we proposed a new framework of information retrieval, which we call "cluster-based indexing" , and evaluated the effectiveness using actual document collections. The proposed scheme employs simultaneous clustering between documents and terms using the previously proposed "probability weighted amount of information" as a navigation criteria. The feature is that it aims at exploiting and utilizing the extracted associations between terms and documents by treating them as 'indices' in conventional retrieval systems. Also, the proposed scheme can be considered as an adaptation of a "co-evolutionary framework" in genetic algorithms in the domain of text retrieval since it first randomly initiates clusters of neighboring terms and documents, and then, applies local optimization to the generated clusters in order to deal the large scale of real-world document collections. In our study, we also investigated the effectiveness of the proposed method using such test collections with 10,000 - 100,000 documents as ; abstracts of academic conference papers extracted from NTCIR1, newspaper articles from Mainichi and Nikkei CD-ROM databases, English stories from Reuters or Financial Times. In the evaluation using a text categorization task, it was confirmed that the categorization performance of the generated clusters was slightly worse but almost comparable to the one of Support Vector Machine, which is known to be one of the best classifier for text categorization. Furthermore, it was shown the method could successfully extract associations between documents on the class border, which is difficult with conventional machine-learning based categorization methods.

Report

(3 results)

2002 Annual Research Report Final Research Report Summary
2001 Annual Research Report

Research Products
(27 results)

All Other

All Publications (27 results)

[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M. Ishizuka, A. Satter (Eds). LNAI2417 Springer. 404-413 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A. Ghosh and S. Tsutsui"Springer. 413-439 (2003)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa and Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa, M. Ishizuka, A. Satter (Eds.): "An Approach to Microscopic Clustering of Terms and Documents, in PRICAI 2002 : Trends in Artifitial Intelligence. LNAI2417"Springer. 404-413 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa, edited by A. Ghosh and S. Tsutsui: "Designed Sampling with Crossover Operators, in chapter of Advances in Evolutionary Computing"Springer. (2003)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measuress"Information Processing & Management. 39. 45-65 (2003)
- Related Report
  2002 Annual Research Report
[Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)
- Related Report
  2002 Annual Research Report
[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 20O2 Congress on Evolutionary Computation. 1787-1792 (2002)
- Related Report
  2002 Annual Research Report
[Publications] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M.Ishizuka, A.Satter (Eds). LNAI2417 Springer. 404-413 (2002)
- Related Report
  2002 Annual Research Report
[Publications] 相澤彰子: "テキスト文書のマイクロクラスタリングに関する検討"情報処理学会自然言語処理研究会. NL-150. 111-117 (2002)
- Related Report
  2002 Annual Research Report
[Publications] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)
- Related Report
  2002 Annual Research Report
[Publications] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A.Ghosh and S.Tsutsui"Springer. 413-439 (2003)
- Related Report
  2002 Annual Research Report
[Publications] 相澤彰子: "Naive手法によるテキスト分類問題へのアプローチ"2001年情報論的学習理論ワークショップ予稿集. 123-128 (2001)
- Related Report
  2001 Annual Research Report
[Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)
- Related Report
  2001 Annual Research Report
[Publications] 相澤彰子: "Naive手法による大規模テキスト分類問題へのアプローチ"情報処理学会自然言語処理研究報告. 147-7. 41-46 (2002)
- Related Report
  2001 Annual Research Report
[Publications] 相澤彰子: "情報空間における双対的クラスタリングの試み"人工知能学会人工知能基礎論研究会資料(第48回). SIG-FAI-A104. 85-90 (2002)
- Related Report
  2001 Annual Research Report
[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. (accepted).
- Related Report
  2001 Annual Research Report
[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"the IEEE 2002 Congress on Evolutionary Computation. (accepted).
- Related Report
  2001 Annual Research Report

A Study on Term-Document Clustering based on a co-evolutionary framework

Principal Investigator

AIZAWA Akiko National Institute of Informatics, Information Infrastructure Research Division, Associate Professor, 情報基盤研究系, 助教授 (90222447)

¥4,100,000 (Direct Cost: ¥4,100,000)

Report

Research Products

[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)

Description

Related Report

[Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

Description

Related Report

[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)

Description

Related Report

[Publications] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M. Ishizuka, A. Satter (Eds). LNAI2417 Springer. 404-413 (2002)

Description

Related Report

[Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

Description

Related Report

[Publications] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

Description

Related Report

[Publications] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A. Ghosh and S. Tsutsui"Springer. 413-439 (2003)

Description

Related Report

[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)

Description

Related Report

[Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

Description

Related Report

[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)

Description

Related Report

[Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

Description

Related Report

[Publications] Akiko Aizawa and Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

Description

Related Report

[Publications] Akiko Aizawa, M. Ishizuka, A. Satter (Eds.): "An Approach to Microscopic Clustering of Terms and Documents, in PRICAI 2002 : Trends in Artifitial Intelligence. LNAI2417"Springer. 404-413 (2002)

Description

Related Report

[Publications] Akiko Aizawa, edited by A. Ghosh and S. Tsutsui: "Designed Sampling with Crossover Operators, in chapter of Advances in Evolutionary Computing"Springer. (2003)

Description

Related Report

[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measuress"Information Processing & Management. 39. 45-65 (2003)

Related Report

[Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

Related Report

[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 20O2 Congress on Evolutionary Computation. 1787-1792 (2002)

Related Report

[Publications] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M.Ishizuka, A.Satter (Eds). LNAI2417 Springer. 404-413 (2002)

Related Report

[Publications] 相澤彰子: "テキスト文書のマイクロクラスタリングに関する検討"情報処理学会自然言語処理研究会. NL-150. 111-117 (2002)

Related Report

[Publications] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

Related Report

[Publications] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A.Ghosh and S.Tsutsui"Springer. 413-439 (2003)

Related Report

[Publications] 相澤 彰子: "Naive手法によるテキスト分類問題へのアプローチ"2001年情報論的学習理論ワークショップ予稿集. 123-128 (2001)

Related Report

[Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

Related Report

[Publications] 相澤彰子: "Naive手法による大規模テキスト分類問題へのアプローチ"情報処理学会 自然言語処理研究報告. 147-7. 41-46 (2002)

Related Report

[Publications] 相澤彰子: "情報空間における双対的クラスタリングの試み"人工知能学会 人工知能基礎論研究会資料(第48回). SIG-FAI-A104. 85-90 (2002)

Related Report

[Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. (accepted).

Related Report

[Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"the IEEE 2002 Congress on Evolutionary Computation. (accepted).

Related Report

[Publications] 相澤彰子: "Naive手法によるテキスト分類問題へのアプローチ"2001年情報論的学習理論ワークショップ予稿集. 123-128 (2001)

[Publications] 相澤彰子: "Naive手法による大規模テキスト分類問題へのアプローチ"情報処理学会自然言語処理研究報告. 147-7. 41-46 (2002)

[Publications] 相澤彰子: "情報空間における双対的クラスタリングの試み"人工知能学会人工知能基礎論研究会資料(第48回). SIG-FAI-A104. 85-90 (2002)