• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Corpus-based Word Sense Disambiguation and its application to Information Retrieval

Research Project

Project/Area Number 15500087
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionUniversity of Yamanashi

Principal Investigator

FUKUMOTO Fumiyo  University of Yamanashi, Department of Research Interdisciplinary Graduate School of Medicine and Engineering, Associate Professor, 大学院・医学工学総合研究部, 助教授 (60262648)

Project Period (FY) 2003 – 2004
Project Status Completed (Fiscal Year 2004)
Budget Amount *help
¥3,900,000 (Direct Cost: ¥3,900,000)
Fiscal Year 2004: ¥1,000,000 (Direct Cost: ¥1,000,000)
Fiscal Year 2003: ¥2,900,000 (Direct Cost: ¥2,900,000)
KeywordsWord Sense Disambiguation / Category Hierarchies / Detecting and Correcting Category Errors / 類似度計算
Research Abstract

In this work, we proposed a method to disambiguate word senses and applied the results to query expansion in Information Retrieval.
We mainly focus and proposed in the following methods.
(1)Learning Subject Drift for Topic Tracking
For topic tracking where data is collected over an extended period of time, the discussion of a topic, i.e. the subject in a story changes over time. This work focuses on subject drift and presents a method for topic tracking on broadcast news stories to handle subject drift. The basic idea is to automatically extract the optimal positive training data of the target topic so as to include only the data which are sufficiently related to the current subject. The method was tested on the TDT1 and TDT2, and the results show the effectiveness of the method.
(2)Correcting Category Errors in Text Classification
We proposed a method for correcting category annotation errors in multi-labeled data which deteriorate overall performance of text classification. We used the hi … More erarchical structure for this purpose : we used it as a simple heuristics, i.e. the resulting category should be the same level, parent or child of the original category assigned to a document Experimental results with the Reuters 96 corpora show that our method achieves high precision in detecting and correcting annotation errors. Further, results on text classification improves accuracy.
(3)A comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora
We addressed the problem dealing with a large collection of data, and investigate the use of automatically constructing category hierarchy from a given set of categories to improve classification of large corpora. We used two well-known techniques, partitioning clustering, k-means and a loss function to create category hierarchy. K-means is to cluster the given categories in a hierarchy. To select the proper number of k, we use a loss function which measures the degree of our disappointment in any differences between the true distribution over inputs and the learner's prediction. Once the optimal number of k is selected, for each duster, the procedure is repeated. Our evaluation using the 1996 Reuters corpus which consists of 806,791 documents shows that automatically constructing hierarchy improves classification accuracy.
(4)Word Sense Disambiguation in Information Retrieval
We proposed a method for feature selection which is used for disambiguating word senses. In our method, sets of features which correspond to each different sense of an ambiguous word are selected by applying a statistical technique. Further, we applied the results to query expansion in Information Retrieval. Less

Report

(3 results)
  • 2004 Annual Research Report   Final Research Report Summary
  • 2003 Annual Research Report
  • Research Products

    (12 results)

All 2005 2004 Other

All Journal Article (10 results) Publications (2 results)

  • [Journal Article] Using Category Hierarchies for Correcting Category Errors in Multilabeled Data2005

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      2^<nd> Language & Technology Conference (To appear)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Using Category Hierarchies for Correcting Category Errors in Multi-labeled Data2005

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      2^<nd> Language and Technology Conference (To appear)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Annual Research Report 2004 Final Research Report Summary
  • [Journal Article] Using Category Hierarchies for Correcting Category Errors in a Corpus2004

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc.Of the Asia Information Retrieval Symposium

      Pages: 277-280

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Annual Research Report 2004 Final Research Report Summary
  • [Journal Article] Learning Subject Drift for Topic Tracking2004

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc.Of the 8^<th> International Conference on Spoken Language Processing

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Annual Research Report 2004 Final Research Report Summary
  • [Journal Article] Correcting Category Errors in Text Classification2004

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc.Of the 20^<th> International Conference on Computational Linguistics

      Pages: 868-875

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Annual Research Report 2004 Final Research Report Summary
  • [Journal Article] A Comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora2004

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc of the 8^<th> Conference on Computational Natural Language Learning

      Pages: 65-72

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Annual Research Report 2004 Final Research Report Summary
  • [Journal Article] 知的情報検索のための大規模言語データの利用2004

    • Author(s)
      福本 文代
    • Journal Title

      電子情報通信学会2004年総合大会,チュートリアル講演

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Annual Research Report 2004 Final Research Report Summary
  • [Journal Article] Learning Subject Drift for Topic Tracking2004

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc of the 8^<th> International Conference on Spoken Language Processing

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] A Comparison of Manual and Automatic Constructions of Category Hierarchy For Classifying Large Corpora2004

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc of the 8^<th> Conference on Computational Natural Language Learning

      Pages: 65-72

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Using Very Large Corpora for Intelligent Information Retrieval2004

    • Author(s)
      F.Fukumoto
    • Journal Title

      Proc.Of the IEICE

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Publications] 福本 文代, 鈴木 良弥, 山田 寛康: "話題の推移に基づく続報記事の自動抽出"情報処理学会論文誌. 44・7. 1766-1777 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Fukumoto Fumiyo, Suzuki Yoshimi: "A Comparison of Manual and Automatic Construction of Category Hierarchy for Classifying Large Corpora"Eighth Conference on Computational Natural Language Learning. (to appear). (2004)

    • Related Report
      2003 Annual Research Report

URL: 

Published: 2003-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi