Corpus-based Word Sense Disambiguation and its application to Information Retrieval

Research Project

Project/Area Number	15500087
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	University of Yamanashi
Principal Investigator	FUKUMOTO Fumiyo University of Yamanashi, Department of Research Interdisciplinary Graduate School of Medicine and Engineering, Associate Professor, 大学院・医学工学総合研究部, 助教授 (60262648)
Project Period (FY)	2003 – 2004
Project Status	Completed (Fiscal Year 2004)
Budget Amount *help	¥3,900,000 (Direct Cost: ¥3,900,000) Fiscal Year 2004: ¥1,000,000 (Direct Cost: ¥1,000,000) Fiscal Year 2003: ¥2,900,000 (Direct Cost: ¥2,900,000)
Keywords	Word Sense Disambiguation / Category Hierarchies / Detecting and Correcting Category Errors / 類似度計算
Research Abstract	In this work, we proposed a method to disambiguate word senses and applied the results to query expansion in Information Retrieval. We mainly focus and proposed in the following methods. (1)Learning Subject Drift for Topic Tracking For topic tracking where data is collected over an extended period of time, the discussion of a topic, i.e. the subject in a story changes over time. This work focuses on subject drift and presents a method for topic tracking on broadcast news stories to handle subject drift. The basic idea is to automatically extract the optimal positive training data of the target topic so as to include only the data which are sufficiently related to the current subject. The method was tested on the TDT1 and TDT2, and the results show the effectiveness of the method. (2)Correcting Category Errors in Text Classification We proposed a method for correcting category annotation errors in multi-labeled data which deteriorate overall performance of text classification. We used the hi … More erarchical structure for this purpose : we used it as a simple heuristics, i.e. the resulting category should be the same level, parent or child of the original category assigned to a document Experimental results with the Reuters 96 corpora show that our method achieves high precision in detecting and correcting annotation errors. Further, results on text classification improves accuracy. (3)A comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora We addressed the problem dealing with a large collection of data, and investigate the use of automatically constructing category hierarchy from a given set of categories to improve classification of large corpora. We used two well-known techniques, partitioning clustering, k-means and a loss function to create category hierarchy. K-means is to cluster the given categories in a hierarchy. To select the proper number of k, we use a loss function which measures the degree of our disappointment in any differences between the true distribution over inputs and the learner's prediction. Once the optimal number of k is selected, for each duster, the procedure is repeated. Our evaluation using the 1996 Reuters corpus which consists of 806,791 documents shows that automatically constructing hierarchy improves classification accuracy. (4)Word Sense Disambiguation in Information Retrieval We proposed a method for feature selection which is used for disambiguating word senses. In our method, sets of features which correspond to each different sense of an ambiguous word are selected by applying a statistical technique. Further, we applied the results to query expansion in Information Retrieval. Less

Report

(3 results)

2004 Annual Research Report Final Research Report Summary
2003 Annual Research Report

Research Products
(12 results)

All 2005 2004 Other

All Journal Article (10 results) Publications (2 results)

[Journal Article] Using Category Hierarchies for Correcting Category Errors in Multilabeled Data2005
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  2^<nd> Language & Technology Conference (To appear)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Using Category Hierarchies for Correcting Category Errors in Multi-labeled Data2005
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  2^<nd> Language and Technology Conference (To appear)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Annual Research Report 2004 Final Research Report Summary
[Journal Article] Using Category Hierarchies for Correcting Category Errors in a Corpus2004
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc.Of the Asia Information Retrieval Symposium
  
  Pages: 277-280
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Annual Research Report 2004 Final Research Report Summary
[Journal Article] Learning Subject Drift for Topic Tracking2004
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc.Of the 8^<th> International Conference on Spoken Language Processing
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Annual Research Report 2004 Final Research Report Summary
[Journal Article] Correcting Category Errors in Text Classification2004
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc.Of the 20^<th> International Conference on Computational Linguistics
  
  Pages: 868-875
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Annual Research Report 2004 Final Research Report Summary
[Journal Article] A Comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora2004
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc of the 8^<th> Conference on Computational Natural Language Learning
  
  Pages: 65-72
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Annual Research Report 2004 Final Research Report Summary
[Journal Article] 知的情報検索のための大規模言語データの利用2004
- Author(s)
  福本文代
- Journal Title
  
  電子情報通信学会2004年総合大会,チュートリアル講演
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Annual Research Report 2004 Final Research Report Summary
[Journal Article] Learning Subject Drift for Topic Tracking2004
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc of the 8^<th> International Conference on Spoken Language Processing
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] A Comparison of Manual and Automatic Constructions of Category Hierarchy For Classifying Large Corpora2004
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc of the 8^<th> Conference on Computational Natural Language Learning
  
  Pages: 65-72
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Using Very Large Corpora for Intelligent Information Retrieval2004
- Author(s)
  F.Fukumoto
- Journal Title
  
  Proc.Of the IEICE
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Final Research Report Summary
[Publications] 福本文代, 鈴木良弥, 山田寛康: "話題の推移に基づく続報記事の自動抽出"情報処理学会論文誌. 44・7. 1766-1777 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Fukumoto Fumiyo, Suzuki Yoshimi: "A Comparison of Manual and Automatic Construction of Category Hierarchy for Classifying Large Corpora"Eighth Conference on Computational Natural Language Learning. (to appear). (2004)
- Related Report
  2003 Annual Research Report

Corpus-based Word Sense Disambiguation and its application to Information Retrieval

Principal Investigator

FUKUMOTO Fumiyo University of Yamanashi, Department of Research Interdisciplinary Graduate School of Medicine and Engineering, Associate Professor, 大学院・医学工学総合研究部, 助教授 (60262648)

¥3,900,000 (Direct Cost: ¥3,900,000)

Report

Research Products

[Journal Article] Using Category Hierarchies for Correcting Category Errors in Multilabeled Data2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Using Category Hierarchies for Correcting Category Errors in Multi-labeled Data2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Using Category Hierarchies for Correcting Category Errors in a Corpus2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Learning Subject Drift for Topic Tracking2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Correcting Category Errors in Text Classification2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] A Comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] 知的情報検索のための大規模言語データの利用2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Learning Subject Drift for Topic Tracking2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] A Comparison of Manual and Automatic Constructions of Category Hierarchy For Classifying Large Corpora2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Using Very Large Corpora for Intelligent Information Retrieval2004

Author(s)

Journal Title

Description

Related Report

[Publications] 福本 文代, 鈴木 良弥, 山田 寛康: "話題の推移に基づく続報記事の自動抽出"情報処理学会論文誌. 44・7. 1766-1777 (2003)

Related Report

[Publications] Fukumoto Fumiyo, Suzuki Yoshimi: "A Comparison of Manual and Automatic Construction of Category Hierarchy for Classifying Large Corpora"Eighth Conference on Computational Natural Language Learning. (to appear). (2004)

Related Report

[Publications] 福本文代, 鈴木良弥, 山田寛康: "話題の推移に基づく続報記事の自動抽出"情報処理学会論文誌. 44・7. 1766-1777 (2003)