Toward Optimal Feature Selection for Word Sense Disambiguation and its Application to Information Retrieval
Project/Area Number |
13680441
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | University of Yamanashi |
Principal Investigator |
FUKUMOTO Fumiyo Univ. of Yamanashi, Faculty of Engineering, Ass. Prof., 工学部, 助教授 (60262648)
|
Project Period (FY) |
2001 – 2002
|
Project Status |
Completed (Fiscal Year 2002)
|
Budget Amount *help |
¥3,400,000 (Direct Cost: ¥3,400,000)
Fiscal Year 2002: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2001: ¥2,000,000 (Direct Cost: ¥2,000,000)
|
Keywords | Word Sense Disambiguation / Feature Selection / Information Retrieval / 多義語の曖昧さの解消 / 機械学習 / 文書の自動分類 |
Research Abstract |
This study describes mainly the following three methods. One is a method for feature selection which is used for disambiguating word senses. In our method, sets of features which correspond to each different sense of an ambiguous word are selected by applying a machine learning technique. Empirical results which were tested on the two data, one is 'line' and 'interest' data, and another is SENSEVAL1 data, show that the performance of the method is comparable to the existing sense disambiguation techniques. The second is a method for learning text representation for categorization task. The representation of words in the text, is a variation on the synset of WordNet. A machine learning technique is applied to induce a representative model. The results show that incorporating WordNet into text representation can lead to improvements, especially for rare categories. The third is a method for text classification which manipulates a large collection of data using two well-known machine learning techniques, Naive Bayes (NB) and Support Vector Machines (SVMs). NB is based on the assumption of word independence in a text, which makes the computation of it far more efficient. SVMs, on the other hand, have the potential to handle large feature spaces, which makes it possible to produce better performance. The training data for SVMs are extracted using NB classifiers according to the category hierarchies, which makes it possible to reduce the amount of computation necessary for classification without sacrificing accuracy.
|
Report
(3 results)
Research Products
(14 results)