2002 Fiscal Year Final Research Report Summary

A study of natural language learning by complementary use of tagged and untagged corpus

Research Project

Project/Area Number	13680429
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	IBARAKI UNIVERSITY
Principal Investigator	SHINNOU Hiroyuki Ibaraki University, college of Engineering, Associate Professor, 工学部, 助教授 (10250987)
Project Period (FY)	2001 – 2002
Keywords	Unsupervised learning / Co-training / EM algorithm / Machine Learning / WSD / Senseval-2 / Fuzzy Clustering
Research Abstract	The inductive learning approach has made a great success in natural language processing. However, this approach has serious problem that the inductive learning method needs tagged training data which is expensive. The aim of this study uses untagged corpus to overcome this issue. This approach is an unsupervised learning methods. Most of unsupervised learning methods use multiviews. Especially, Co-training proposed by Blum et al. and a method using EM algorithm proposed by Nigam et al. are representative. These methods were used for document classification. It is unknown whether they can be applied to word sense disambiguation problems which is the main problem in natural language processing. Last year, I studied Co-training, and proposed the method to relax the independence condition of two feature sets. And I made a presentation about the method in an international conference in this year. Moreover, I studied a method using EM algorithm in this year. And I applied the method to Japanese translation task of SENSEVAL2. By this, I showed that the method proposed by Nigam et al, can be applied to word sense disambiguation problems. This research was published in a Journal. In this paper, I showed that this method cannot often improve the performance of learned rules. To overcome this problem, I proposed the method using cross validation and ad hoc judgments. This method has accomplished the significant performance in Japanese dictionary task of Senseval2. In particular, the score for noun words is as much as best public score. This result were shared in a workshop. And the paper on this method is accepted in an international conference. Next I studied Fuzzy clustering which is essentially shmilar to EM algorithm. I used Fuzzy clustering as a kind of unsupervised learning methods. This study was made a presentation in a workshop held in March.

Research Products
(12 results)

All Other

All Publications (12 results)

[Publications] H.Shinnou: "Learning of word sense disambiguation rules by Co-training, checking co-occurrence of features"LREC-02. 4. 1380-1384 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 新納浩幸, 佐々木稔: "EMアルゴリズムの最適ループ回数の予測を用いた語義判別規則の教師なし学習"情報処理学会自然言語処理研究会. 152-8. 51-58 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 新納浩幸, 佐々木稔: "情報検索手法を利用した語義判別問題の高速解法"情報処理学会自然言語処理研究会. 152-9. 57-62 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 高橋篤史, 新納浩幸: "ファジィクラスタリングを用いた語義判別規則の教師なし学習"言語処理学会第9回年次大会. 306-309 (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 新納浩幸: "EMアルゴリズムを用いた教師なし学習の日本語翻訳タスクへの適用"自然言語処理. 10. 61-73 (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] H.Shinnou, M.Sasaki: "Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm"Seventh Conference on Natural Language Learning. 41-48 (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroyuki Shinnou: "Learning of word sense disambiguation rules by Co-training, checking co-occurrence of features"Proc. LREC-02. 1380-1384 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroyuki Shinnou and Minoru Sasaki: "Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm"NL SIG notes of IPSJ. NL-152-8. 51-58 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroyuki Shinuou and Minoru Sasaki: "Fast method of word sense disambiguation using information retrieval technique"NL SIG notes of IPSJ. NL-152-9. 57-62 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Atsushi Takahashi and Hiroyuki Shinnou: "Unsupervised learning of word sense disambiguation rules by Fuzzy clustering"Proc. of 9th Annual Meeting of the Association for NLP. 306-309 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroyuki Shinnou: "Application of unsupervised learning using EM alogorthm to Japanese Translation Task"Journal of Natural Language Processing. 10, No. 3. 61-73 (2003)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroyuki Shinnou and Miuoru Sasaki: "Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm"7th Conference on Natural Language Learning. 41-48 (2003)
- Description
  「研究成果報告書概要(欧文)」より

2002 Fiscal Year Final Research Report Summary

A study of natural language learning by complementary use of tagged and untagged corpus

Principal Investigator

SHINNOU Hiroyuki Ibaraki University, college of Engineering, Associate Professor, 工学部, 助教授 (10250987)

Research Products

[Publications] H.Shinnou: "Learning of word sense disambiguation rules by Co-training, checking co-occurrence of features"LREC-02. 4. 1380-1384 (2002)

Description

[Publications] 新納浩幸, 佐々木稔: "EMアルゴリズムの最適ループ回数の予測を用いた語義判別規則の教師なし学習"情報処理学会自然言語処理研究会. 152-8. 51-58 (2002)

Description

[Publications] 新納浩幸, 佐々木稔: "情報検索手法を利用した語義判別問題の高速解法"情報処理学会自然言語処理研究会. 152-9. 57-62 (2002)

Description

[Publications] 高橋篤史, 新納浩幸: "ファジィクラスタリングを用いた語義判別規則の教師なし学習"言語処理学会第9回年次大会. 306-309 (2003)

Description

[Publications] 新納浩幸: "EMアルゴリズムを用いた教師なし学習の日本語翻訳タスクへの適用"自然言語処理. 10. 61-73 (2003)

Description

[Publications] H.Shinnou, M.Sasaki: "Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm"Seventh Conference on Natural Language Learning. 41-48 (2003)

Description

[Publications] Hiroyuki Shinnou: "Learning of word sense disambiguation rules by Co-training, checking co-occurrence of features"Proc. LREC-02. 1380-1384 (2002)

Description

[Publications] Hiroyuki Shinnou and Minoru Sasaki: "Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm"NL SIG notes of IPSJ. NL-152-8. 51-58 (2002)

Description

[Publications] Hiroyuki Shinuou and Minoru Sasaki: "Fast method of word sense disambiguation using information retrieval technique"NL SIG notes of IPSJ. NL-152-9. 57-62 (2002)

Description

[Publications] Atsushi Takahashi and Hiroyuki Shinnou: "Unsupervised learning of word sense disambiguation rules by Fuzzy clustering"Proc. of 9th Annual Meeting of the Association for NLP. 306-309 (2002)

Description

[Publications] Hiroyuki Shinnou: "Application of unsupervised learning using EM alogorthm to Japanese Translation Task"Journal of Natural Language Processing. 10, No. 3. 61-73 (2003)

Description

[Publications] Hiroyuki Shinnou and Miuoru Sasaki: "Unsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm"7th Conference on Natural Language Learning. 41-48 (2003)

Description