• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

A Study on Term-Document Clustering based on a co-evolutionary framework

Research Project

Project/Area Number 13680473
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionNational Institute of Informatics

Principal Investigator

AIZAWA Akiko  National Institute of Informatics, Information Infrastructure Research Division, Associate Professor, 情報基盤研究系, 助教授 (90222447)

Project Period (FY) 2001 – 2002
Project Status Completed (Fiscal Year 2002)
Budget Amount *help
¥4,100,000 (Direct Cost: ¥4,100,000)
Fiscal Year 2002: ¥2,200,000 (Direct Cost: ¥2,200,000)
Fiscal Year 2001: ¥1,900,000 (Direct Cost: ¥1,900,000)
KeywordsInformation Retrieval / Dual Clustering / Automatic Text Categorization / Probability Weighted Information / Micro Clustering / Co-evolutionary Algorithms / Evolutionary Computation / NACSIS Academic Conference Paper Database
Research Abstract

In this study, we proposed a new framework of information retrieval, which we call "cluster-based indexing" , and evaluated the effectiveness using actual document collections.
The proposed scheme employs simultaneous clustering between documents and terms using the previously proposed "probability weighted amount of information" as a navigation criteria. The feature is that it aims at exploiting and utilizing the extracted associations between terms and documents by treating them as 'indices' in conventional retrieval systems. Also, the proposed scheme can be considered as an adaptation of a "co-evolutionary framework" in genetic algorithms in the domain of text retrieval since it first randomly initiates clusters of neighboring terms and documents, and then, applies local optimization to the generated clusters in order to deal the large scale of real-world document collections.
In our study, we also investigated the effectiveness of the proposed method using such test collections with 10,000 - 100,000 documents as ; abstracts of academic conference papers extracted from NTCIR1, newspaper articles from Mainichi and Nikkei CD-ROM databases, English stories from Reuters or Financial Times. In the evaluation using a text categorization task, it was confirmed that the categorization performance of the generated clusters was slightly worse but almost comparable to the one of Support Vector Machine, which is known to be one of the best classifier for text categorization. Furthermore, it was shown the method could successfully extract associations between documents on the class border, which is difficult with conventional machine-learning based categorization methods.

Report

(3 results)
  • 2002 Annual Research Report   Final Research Report Summary
  • 2001 Annual Research Report
  • Research Products

    (27 results)

All Other

All Publications (27 results)

  • [Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M. Ishizuka, A. Satter (Eds). LNAI2417 Springer. 404-413 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A. Ghosh and S. Tsutsui"Springer. 413-439 (2003)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa and Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa, M. Ishizuka, A. Satter (Eds.): "An Approach to Microscopic Clustering of Terms and Documents, in PRICAI 2002 : Trends in Artifitial Intelligence. LNAI2417"Springer. 404-413 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa, edited by A. Ghosh and S. Tsutsui: "Designed Sampling with Crossover Operators, in chapter of Advances in Evolutionary Computing"Springer. (2003)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measuress"Information Processing & Management. 39. 45-65 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 20O2 Congress on Evolutionary Computation. 1787-1792 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M.Ishizuka, A.Satter (Eds). LNAI2417 Springer. 404-413 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 相澤彰子: "テキスト文書のマイクロクラスタリングに関する検討"情報処理学会自然言語処理研究会. NL-150. 111-117 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A.Ghosh and S.Tsutsui"Springer. 413-439 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 相澤 彰子: "Naive手法によるテキスト分類問題へのアプローチ"2001年情報論的学習理論ワークショップ予稿集. 123-128 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 相澤彰子: "Naive手法による大規模テキスト分類問題へのアプローチ"情報処理学会 自然言語処理研究報告. 147-7. 41-46 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] 相澤彰子: "情報空間における双対的クラスタリングの試み"人工知能学会 人工知能基礎論研究会資料(第48回). SIG-FAI-A104. 85-90 (2002)

    • Related Report
      2001 Annual Research Report
  • [Publications] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. (accepted).

    • Related Report
      2001 Annual Research Report
  • [Publications] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"the IEEE 2002 Congress on Evolutionary Computation. (accepted).

    • Related Report
      2001 Annual Research Report

URL: 

Published: 2001-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi