Toward Optimal Feature Selection for Word Sense Disambiguation and its Application to Information Retrieval

Research Project

Project/Area Number	13680441
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	University of Yamanashi
Principal Investigator	FUKUMOTO Fumiyo Univ. of Yamanashi, Faculty of Engineering, Ass. Prof., 工学部, 助教授 (60262648)
Project Period (FY)	2001 – 2002
Project Status	Completed (Fiscal Year 2002)
Budget Amount *help	¥3,400,000 (Direct Cost: ¥3,400,000) Fiscal Year 2002: ¥1,400,000 (Direct Cost: ¥1,400,000) Fiscal Year 2001: ¥2,000,000 (Direct Cost: ¥2,000,000)
Keywords	Word Sense Disambiguation / Feature Selection / Information Retrieval / 多義語の曖昧さの解消 / 機械学習 / 文書の自動分類
Research Abstract	This study describes mainly the following three methods. One is a method for feature selection which is used for disambiguating word senses. In our method, sets of features which correspond to each different sense of an ambiguous word are selected by applying a machine learning technique. Empirical results which were tested on the two data, one is 'line' and 'interest' data, and another is SENSEVAL1 data, show that the performance of the method is comparable to the existing sense disambiguation techniques. The second is a method for learning text representation for categorization task. The representation of words in the text, is a variation on the synset of WordNet. A machine learning technique is applied to induce a representative model. The results show that incorporating WordNet into text representation can lead to improvements, especially for rare categories. The third is a method for text classification which manipulates a large collection of data using two well-known machine learning techniques, Naive Bayes (NB) and Support Vector Machines (SVMs). NB is based on the assumption of word independence in a text, which makes the computation of it far more efficient. SVMs, on the other hand, have the potential to handle large feature spaces, which makes it possible to produce better performance. The training data for SVMs are extracted using NB classifiers according to the category hierarchies, which makes it possible to reduce the amount of computation necessary for classification without sacrificing accuracy.

Report

(3 results)

2002 Annual Research Report Final Research Report Summary
2001 Annual Research Report

Research Products
(14 results)

All Other

All Publications (14 results)

[Publications] 福本文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. 43・1. 20-33 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] 福本文代, 鈴木良弥: "WordNetの同義語クラスとその上位関係を利用した文書の自動分類"情報処理学会論文誌. 32・6. 1852-1865 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] F.Fukumoto, Y.Suzuki: "Manipulating Large Corpora for Text Classification"Conference on Empirical Methods in Natural Language Processing(EMNLP'02). 196-203 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] F.Fukumoto, Y.Suzuki: "Detecting Shifts in News Stories for Paragraph Extraction"19^<th> International Conference on Computational Linguistics(COLING'02). 280-286 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Fumiyo Fukumoto: "Toward Optimal Features Selection for Word Sense Disambiguation"Trans of Information Processing Society of Japan. (43)1. 20-33 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Fumiyo Fukumoto, and Yoshimi Suzuki: "Using Synonyms and Their Hypernymy Relations of WordNet for Text Classification"Trans of Information Processing Society of Japan. (43)6. 1852-1865 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Fumiyo Fukumoto, and Yoshimi Suzuki: "Manipulating Large Corpora for Text Classification"Conference on Empirical Methods in Natural Language Processing (EMNLP'02). 196-203 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] Fumiyo Fukumoto, and Yoshimi Suzuki: "Detecting Shifts in News Stories for Paragraph Extraction"19^<th> International Conference on Computational Linguistics (COLING'02). 280-286 (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2002 Final Research Report Summary
[Publications] 福本文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. 43・1. 20-33 (2002)
- Related Report
  2002 Annual Research Report
[Publications] 福本文代, 鈴木良弥: "WordNet同義固クラスとその上位関係を利用した文書の自動分類"情報処理学論文誌. 43・6. 1852-1865 (2002)
- Related Report
  2002 Annual Research Report
[Publications] F.Fukumoto, Y.Suzuki: "Manipulating Large Corpora for Text Classification"Conference on Empirical Methods in Natural Language Processing(EMNLP'02). 196-203 (2002)
- Related Report
  2002 Annual Research Report
[Publications] F.Fukumoto, Y.Suzuki: "Detecting Shifts in News Stories for Paragraph Extraction"19th International Conference on Computational Lin-guistics (COLING'02). 280-286 (2002)
- Related Report
  2002 Annual Research Report
[Publications] F.Fukumoto, Y.Suzuki: "Lcarning Lexical Represcntation for Text Categorization"Proc. of the NAACL2001 Workshop on WordNet and Other Lcxical Rcsources : Applications, Exensions and Customizations. 156-161 (2001)
- Related Report
  2001 Annual Research Report
[Publications] 福本文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. Vol43, No1. 20-33 (2002)
- Related Report
  2001 Annual Research Report

Toward Optimal Feature Selection for Word Sense Disambiguation and its Application to Information Retrieval

Principal Investigator

FUKUMOTO Fumiyo Univ. of Yamanashi, Faculty of Engineering, Ass. Prof., 工学部, 助教授 (60262648)

¥3,400,000 (Direct Cost: ¥3,400,000)

Report

Research Products

[Publications] 福本 文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. 43・1. 20-33 (2002)

Description

Related Report

[Publications] 福本 文代, 鈴木 良弥: "WordNetの同義語クラスとその上位関係を利用した文書の自動分類"情報処理学会論文誌. 32・6. 1852-1865 (2002)

Description

Related Report

[Publications] F.Fukumoto, Y.Suzuki: "Manipulating Large Corpora for Text Classification"Conference on Empirical Methods in Natural Language Processing(EMNLP'02). 196-203 (2002)

Description

Related Report

[Publications] F.Fukumoto, Y.Suzuki: "Detecting Shifts in News Stories for Paragraph Extraction"19^<th> International Conference on Computational Linguistics(COLING'02). 280-286 (2002)

Description

Related Report

[Publications] Fumiyo Fukumoto: "Toward Optimal Features Selection for Word Sense Disambiguation"Trans of Information Processing Society of Japan. (43)1. 20-33 (2002)

Description

Related Report

[Publications] Fumiyo Fukumoto, and Yoshimi Suzuki: "Using Synonyms and Their Hypernymy Relations of WordNet for Text Classification"Trans of Information Processing Society of Japan. (43)6. 1852-1865 (2002)

Description

Related Report

[Publications] Fumiyo Fukumoto, and Yoshimi Suzuki: "Manipulating Large Corpora for Text Classification"Conference on Empirical Methods in Natural Language Processing (EMNLP'02). 196-203 (2002)

Description

Related Report

[Publications] Fumiyo Fukumoto, and Yoshimi Suzuki: "Detecting Shifts in News Stories for Paragraph Extraction"19^<th> International Conference on Computational Linguistics (COLING'02). 280-286 (2002)

Description

Related Report

[Publications] 福本 文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. 43・1. 20-33 (2002)

Related Report

[Publications] 福本 文代, 鈴木 良弥: "WordNet同義固クラスとその上位関係を利用した文書の自動分類"情報処理学論文誌. 43・6. 1852-1865 (2002)

Related Report

[Publications] F.Fukumoto, Y.Suzuki: "Manipulating Large Corpora for Text Classification"Conference on Empirical Methods in Natural Language Processing(EMNLP'02). 196-203 (2002)

Related Report

[Publications] F.Fukumoto, Y.Suzuki: "Detecting Shifts in News Stories for Paragraph Extraction"19th International Conference on Computational Lin-guistics (COLING'02). 280-286 (2002)

Related Report

[Publications] F.Fukumoto, Y.Suzuki: "Lcarning Lexical Represcntation for Text Categorization"Proc. of the NAACL2001 Workshop on WordNet and Other Lcxical Rcsources : Applications, Exensions and Customizations. 156-161 (2001)

Related Report

[Publications] 福本文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. Vol43, No1. 20-33 (2002)

Related Report

[Publications] 福本文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. 43・1. 20-33 (2002)

[Publications] 福本文代, 鈴木良弥: "WordNetの同義語クラスとその上位関係を利用した文書の自動分類"情報処理学会論文誌. 32・6. 1852-1865 (2002)

[Publications] 福本文代: "語義の曖昧性解消のための最適な属性選択"情報処理学会論文誌. 43・1. 20-33 (2002)

[Publications] 福本文代, 鈴木良弥: "WordNet同義固クラスとその上位関係を利用した文書の自動分類"情報処理学論文誌. 43・6. 1852-1865 (2002)