Automatic Acquisition of Linguistic Knowledge Using Bilingual Comparable Corpora and its Application to Topic Tracking

Research Project

Project/Area Number	17500091
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	University. of Yamanashi
Principal Investigator	FUKUMOTO Fumiyo University of Yamanashi, Department of Research Interdisciplinary Graduate School of Medicine and Engineering, Associate Professor, 大学院医学工学総合研究部, 助教授 (60262648)
Project Period (FY)	2005 – 2006
Project Status	Completed (Fiscal Year 2006)
Budget Amount *help	¥3,600,000 (Direct Cost: ¥3,600,000) Fiscal Year 2006: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2005: ¥2,800,000 (Direct Cost: ¥2,800,000)
Keywords	Comparable Corpora / Polysemous Word / Bilingual Terms / Topic Tracking / Semi-supervised Clustering / Linguistic Knowledge Acquisition / 語義 / 続報記事 / 半教師付きクラスタリング / 多言語コーパス / コンパラコーパス / EMアルゴリズム / 多義解消
Research Abstract	With the exponential growth of information on the Internet, it is becoming increasingly difficult to find and organize relevant material. Topic Detection and Tracking (TDT) is a research area to address this problem and consists of five different tasks : story link detection, clustering topic detection, new event detection, story segmentation and topic tracking. The last task, topic tracking, is the focus of this paper. Topic tracking starts from a few sample stories and finds all subsequent stories that discuss the target topic. Here, a topic in the TDT context is something that happens at a specific place and time associated with some specific action In this work, we address the problem of skewed data in topic tracking : the small number of stories labeled positive as compared to negative stories, and proposed a method for estimating effective training stories for the topic tracking task. For a small number of labeled positive stories, we use bilingual comparable corpora, i.e., English and Japanese corpora, together with the EDR bilingual dictionary, and extract story pairs consisting of positive and associated stories. To overcome the problem of a large number of labeled negative stories, we classified them into some clusters. This is done using a semi-supervised clustering algorithm, combining k-means with EM. The method was tested on the TDT English corpus, and the results showed that the system works well when the topic under tracking is talking about an event originating in the source language country, even for a small number of initial positive training stories

Report

(3 results)

2006 Annual Research Report Final Research Report Summary
2005 Annual Research Report

Research Products
(18 results)

All 2006 2005

All Journal Article (18 results)

[Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc. of the 21^<st> International Conference on Computational Linguistics and 44^<th> Annual Meeting of the Association for Computational Linguistics
  
  Pages: 231-238
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Trans. of IEICE Information and Systems E89-E, 4
  
  Pages: 1543-1554
- NAID
  110007504507
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 分野の階層構造を利用したコーパスの誤り修正と文書分類への適用2006
- Author(s)
  福本文代, 鈴木良弥
- Journal Title
  
  電子情報通信学会論文誌 J89-D, 3
  
  Pages: 552-566
- NAID
  110004662710
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] マルチラベルの分野名タグに対する事例間類似度に基づいた誤り修正2006
- Author(s)
  濱野秀俊, 福本文代
- Journal Title
  
  電子情報通信学会論文誌 J89-D, 10
  
  Pages: 2338-2347
- NAID
  110007380160
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc. of the 21^<st> International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
  
  Pages: 231-238
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Trans. of IEICE Information and Systems J89-D, 3
  
  Pages: 1543-1554
- NAID
  110007504507
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Using Category Hierarchies for Correcting Category Errors and its Application to Text Classification2006
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Trans. of IEICE Information and Systems J89-D, 3
  
  Pages: 552-566
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Correcting Category Errors in Multi-Labeled Data based on the Similarity between Two Examples2006
- Author(s)
  H.Hamano, F.Fukumoto
- Journal Title
  
  Trans. of IEICE Information and Systems J89-D, 10
  
  Pages: 2338-2347
- NAID
  110007380160
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc.of the 21^<st> International Conference on Computational Linguistics and 44^<th> Annual Meeting of the Association for Computational Linguistics
  
  Pages: 231-238
- Related Report
  2006 Annual Research Report
[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Trans.of IEICE Information and Systems E89-E・4
  
  Pages: 1543-1554
- NAID
  110007504507
- Related Report
  2006 Annual Research Report
[Journal Article] 分野の階層構造を利用したコーパスの誤り修正と文書分類への適用2006
- Author(s)
  福本文代, 鈴木良弥
- Journal Title
  
  電子情報通信学会論文誌 J89-D・3
  
  Pages: 552-566
- NAID
  110004662710
- Related Report
  2006 Annual Research Report
[Journal Article] 分野の階層構造を利用したコーパスの誤り修正と文書分類への適用2006
- Author(s)
  福本文代, 鈴木良弥
- Journal Title
  
  電子情報通信学会論文誌 Vol.J89-D, No.3(採録決定)
- NAID
  110004662710
- Related Report
  2005 Annual Research Report
[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006
- Author(s)
  Fumiyo Fukumoto, Yoshimi Suzuki
- Journal Title
  
  Trans of IEICE, Information and Systems Vol.E89-D, No.4(To appear)
- NAID
  110007504507
- Related Report
  2005 Annual Research Report
[Journal Article] Using Category Hierarchies for Correcting Errors in Multi-Labeled Date2005
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc. of the 2nd Language and Technology Conference(LTC'05)
  
  Pages: 211-215
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Topic Tracking based on Linguistic Features2005
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc. of the 2nd International Joint Conference on Natural Language Processing(IJCNL'05)
  
  Pages: 10-21
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Using Category Hierarchies for Correcting Errors in Multi-Labeled Date2005
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc. of the 2nd Language and Technology Conference (LTC' 05)
  
  Pages: 211-215
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Topic Tracking based on Linguistic Features2005
- Author(s)
  F.Fukumoto, Y.Suzuki
- Journal Title
  
  Proc. of the 2nd International Joint Conference on Natural Language Processing (IJCNL' 05)
  
  Pages: 10-21
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Topic Tracking Based on Linguistic Features2005
- Author(s)
  Fumiyo Fukumoto, Yusuke Yamaji
- Journal Title
  
  Proc.of the Second International Joint Conference on Natural Language Processing
  
  Pages: 10-21
- Related Report
  2005 Annual Research Report

Automatic Acquisition of Linguistic Knowledge Using Bilingual Comparable Corpora and its Application to Topic Tracking

Principal Investigator

FUKUMOTO Fumiyo University of Yamanashi, Department of Research Interdisciplinary Graduate School of Medicine and Engineering, Associate Professor, 大学院医学工学総合研究部, 助教授 (60262648)

¥3,600,000 (Direct Cost: ¥3,600,000)

Report

Research Products

[Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] 分野の階層構造を利用したコーパスの誤り修正と文書分類への適用2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] マルチラベルの分野名タグに対する事例間類似度に基づいた誤り修正2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Using Category Hierarchies for Correcting Category Errors and its Application to Text Classification2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Correcting Category Errors in Multi-Labeled Data based on the Similarity between Two Examples2006

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006

Author(s)

Journal Title

Related Report

[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 分野の階層構造を利用したコーパスの誤り修正と文書分類への適用2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 分野の階層構造を利用したコーパスの誤り修正と文書分類への適用2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Using Category Hierarchies for Correcting Errors in Multi-Labeled Date2005

Author(s)

Journal Title

Description

Related Report