A study of natural language learning by complementary use of tagged and untagged corpus
Project/Area Number |
13680429
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | IBARAKI UNIVERSITY |
Principal Investigator |
SHINNOU Hiroyuki Ibaraki University, college of Engineering, Associate Professor, 工学部, 助教授 (10250987)
|
Project Period (FY) |
2001 – 2002
|
Project Status |
Completed (Fiscal Year 2002)
|
Budget Amount *help |
¥2,500,000 (Direct Cost: ¥2,500,000)
Fiscal Year 2002: ¥1,200,000 (Direct Cost: ¥1,200,000)
Fiscal Year 2001: ¥1,300,000 (Direct Cost: ¥1,300,000)
|
Keywords | Unsupervised learning / Co-training / EM algorithm / Machine Learning / WSD / Senseval-2 / Fuzzy Clustering |
Research Abstract |
The inductive learning approach has made a great success in natural language processing. However, this approach has serious problem that the inductive learning method needs tagged training data which is expensive. The aim of this study uses untagged corpus to overcome this issue. This approach is an unsupervised learning methods. Most of unsupervised learning methods use multiviews. Especially, Co-training proposed by Blum et al. and a method using EM algorithm proposed by Nigam et al. are representative. These methods were used for document classification. It is unknown whether they can be applied to word sense disambiguation problems which is the main problem in natural language processing. Last year, I studied Co-training, and proposed the method to relax the independence condition of two feature sets. And I made a presentation about the method in an international conference in this year. Moreover, I studied a method using EM algorithm in this year. And I applied the method to Japanese translation task of SENSEVAL2. By this, I showed that the method proposed by Nigam et al, can be applied to word sense disambiguation problems. This research was published in a Journal. In this paper, I showed that this method cannot often improve the performance of learned rules. To overcome this problem, I proposed the method using cross validation and ad hoc judgments. This method has accomplished the significant performance in Japanese dictionary task of Senseval2. In particular, the score for noun words is as much as best public score. This result were shared in a workshop. And the paper on this method is accepted in an international conference. Next I studied Fuzzy clustering which is essentially shmilar to EM algorithm. I used Fuzzy clustering as a kind of unsupervised learning methods. This study was made a presentation in a workshop held in March.
|
Report
(3 results)
Research Products
(23 results)