• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2004 Fiscal Year Final Research Report Summary

A Study on Cluster-based Indexing of Textual Data

Research Project

Project/Area Number 15500081
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Media informatics/Database
Research InstitutionNational Institute of Informatics

Principal Investigator

AIZAWA Akiko  National Institute of Informatics, Research Center for Information Resources, Professor, 情報学資源研究センター, 教授 (90222447)

Project Period (FY) 2003 – 2004
KeywordsText Mining / Statistical Language Model / Document Clustering / Information Retrieval / Amount of Information / Extraction of Noun Phrases
Research Abstract

In this study, we proposed a framework and implementation of an information retrieval system that utilizes clusters of similar documents. The proposed method first generates document clusters together with their representative terms and phrases based on the term distribution or term sequence match. Next, considering each document cluster as a single virtual document, an extended index is created. Upon a query submission, the system uses both the original and the extended indices and returns the integrated result. In the research, we also demonstrated that indices generated based on different viewpoints can be used to enhance the flexibility of the retrieval system.
During the research period, we focused on the following research topics:
1. Co-clustering method that is based on the co-occurrence statistics and mutual information
2. Suffix-array based clustering method that utilizes the repetition of textual elements measured by the proposed coincidence score
3. A framework of cluster-based indexing and its implementation
4. Entity identification using the fast repetition-based clustering method
Future research issues include statistical and analytical text processing methods to automatically extract index phrases from the target retrieved document set, and also methods for the identification of textual elements that refer to the same real-world entities.

  • Research Products

    (16 results)

All 2005 2004 2003

All Journal Article (16 results)

  • [Journal Article] レコード同定問題に関する研究の課題と現状2005

    • Author(s)
      相澤彰子, 大山敬三, 高須淳宏, 安達淳
    • Journal Title

      電子情報通信学会論文誌、D1 VOL.J88-D1 No.3

      Pages: 576-589

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] A Fast Linkage Detection Scheme for Multi-Source Information Integration2005

    • Author(s)
      Akiko Aizawa, Keizo Oyama
    • Journal Title

      WIRI2005 (International Workshop on Challenges in Web Information Retrieval and Integration)

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Techniques and Research Trends in Record Linkage Studies2005

    • Author(s)
      Akiko Aizawa, Atsuhiro Takasu, Keizo Oyama, Jun Adachi
    • Journal Title

      Journal of IEICE Vol.J88-D1 No.3(in Japanese)

      Pages: 576-589

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] A Fast Linkage Detection Scheme for Multi-Source Information Integration2005

    • Author(s)
      Akiko Aizawa, Keizo Oyama
    • Journal Title

      WIRI2005 (International Workshop on Challenges in Web Information Retrieval, Integration)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] 和英著者キーワードからの多言語類語辞書自動構築の試み2004

    • Author(s)
      相澤彰子
    • Journal Title

      情報管理 Vol.47, no.6

      Pages: 401-409

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Record Linkage of Multi-source Databases: ResearchTrends2004

    • Author(s)
      Akiko Aizawa, Atsuhiro Takasu, Keizo Oyama, Jun Adachi
    • Journal Title

      NII Journal(in Japanese) No.8

      Pages: 43-51

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] An Approach to Automatic Generation of Multi-lingual Synonymous Terms Dictionary using Japanese-English Bilingual Author's Keywords2004

    • Author(s)
      Akiko Aizawa
    • Journal Title

      Journal of Information Processing and Management(in Japanese) Vol.47 no.6

      Pages: 401-409

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] A Fast Method fo Duplicated Entries Detection in Bibliographic Databases2004

    • Author(s)
      Akiko Aizawa, Atsuhiro Takasu, Keizo Oyama, Jun Adachi
    • Journal Title

      IPSJ SIG Notes, DBS(in Japanese) Vol.2004 No.45

      Pages: 111-118

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] An Approach to Cluster-based Indexing2004

    • Author(s)
      Akiko Aizawa
    • Journal Title

      IPSJ SIG Notes, NL(in Japanese) 159-007

      Pages: 159-007

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Analysis of Source Identified Text Corpora : Exploring the Statistics of the Reused Text and Authorship2003

    • Author(s)
      Akiko Aizawa
    • Journal Title

      Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03)

      Pages: 383-390

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] 低頻度語の利用によるテキストの分類性能の改善と評価2003

    • Author(s)
      相澤彰子
    • Journal Title

      情報処理学会論文誌 44,7

      Pages: 1720-1730

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Discovering Homographs using N-partite Graph Clustering2003

    • Author(s)
      Hidekazu Nakawatase, Akiko Aizawa
    • Journal Title

      Proceedings of the 6th International Conference on Discovery Science (DS'03)

      Pages: 402-409

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Improving the Performance of Text Categorization Using Low Frequency Terms2003

    • Author(s)
      Akiko Aizawa
    • Journal Title

      Journal of InformationProcessing Society of Japan(in Japanese)

      Pages: 1720-1730

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Extracting and Analyzing Recycled Word Sentences from Text2003

    • Author(s)
      Akiko Aizawa
    • Journal Title

      IPSJ SIG Notes, FI 2003-FI-71

      Pages: 189-196

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] On the Analysis of Source Identified Text Corpora2003

    • Author(s)
      Akiko Aizawa
    • Journal Title

      the 17th Annual Conference of the Japanese Society for Artificial Intelligence(in Japanese) 1C5-05

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Word Sense Discrimination based on Complete N-partite Graph2003

    • Author(s)
      Hidekazu Nakawatase, Akiko Aizawa
    • Journal Title

      Technical Report of IEICE AI2003-2(in Japanese) 103

      Pages: 7-23

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2006-07-11  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi