• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2006 Fiscal Year Final Research Report Summary

Automatic Acquisition of Linguistic Knowledge Using Bilingual Comparable Corpora and its Application to Topic Tracking

Research Project

Project/Area Number 17500091
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionUniversity. of Yamanashi

Principal Investigator

FUKUMOTO Fumiyo  University of Yamanashi, Department of Research Interdisciplinary Graduate School of Medicine and Engineering, Associate Professor, 大学院医学工学総合研究部, 助教授 (60262648)

Project Period (FY) 2005 – 2006
KeywordsComparable Corpora / Polysemous Word / Bilingual Terms / Topic Tracking / Semi-supervised Clustering / Linguistic Knowledge Acquisition
Research Abstract

With the exponential growth of information on the Internet, it is becoming increasingly difficult to find and organize relevant material. Topic Detection and Tracking (TDT) is a research area to address this problem and consists of five different tasks : story link detection, clustering topic detection, new event detection, story segmentation and topic tracking. The last task, topic tracking, is the focus of this paper. Topic tracking starts from a few sample stories and finds all subsequent stories that discuss the target topic. Here, a topic in the TDT context is something that happens at a specific place and time associated with some specific action
In this work, we address the problem of skewed data in topic tracking : the small number of stories labeled positive as compared to negative stories, and proposed a method for estimating effective training stories for the topic tracking task. For a small number of labeled positive stories, we use bilingual comparable corpora, i.e., English and Japanese corpora, together with the EDR bilingual dictionary, and extract story pairs consisting of positive and associated stories. To overcome the problem of a large number of labeled negative stories, we classified them into some clusters. This is done using a semi-supervised clustering algorithm, combining k-means with EM. The method was tested on the TDT English corpus, and the results showed that the system works well when the topic under tracking is talking about an event originating in the source language country, even for a small number of initial positive training stories

  • Research Products

    (12 results)

All 2006 2005

All Journal Article (12 results)

  • [Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc. of the 21^<st> International Conference on Computational Linguistics and 44^<th> Annual Meeting of the Association for Computational Linguistics

      Pages: 231-238

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Trans. of IEICE Information and Systems E89-E, 4

      Pages: 1543-1554

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] 分野の階層構造を利用したコーパスの誤り修正と文書分類への適用2006

    • Author(s)
      福本文代, 鈴木良弥
    • Journal Title

      電子情報通信学会論文誌 J89-D, 3

      Pages: 552-566

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] マルチラベルの分野名タグに対する事例間類似度に基づいた誤り修正2006

    • Author(s)
      濱野秀俊, 福本文代
    • Journal Title

      電子情報通信学会論文誌 J89-D, 10

      Pages: 2338-2347

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Using Comparable Corpora and Semi-Supervised Clustering for Topic Tracking2006

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc. of the 21^<st> International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

      Pages: 231-238

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Generating Category Hierarchy for Classifying Large Corpora2006

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Trans. of IEICE Information and Systems J89-D, 3

      Pages: 1543-1554

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Using Category Hierarchies for Correcting Category Errors and its Application to Text Classification2006

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Trans. of IEICE Information and Systems J89-D, 3

      Pages: 552-566

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Correcting Category Errors in Multi-Labeled Data based on the Similarity between Two Examples2006

    • Author(s)
      H.Hamano, F.Fukumoto
    • Journal Title

      Trans. of IEICE Information and Systems J89-D, 10

      Pages: 2338-2347

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Using Category Hierarchies for Correcting Errors in Multi-Labeled Date2005

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc. of the 2nd Language and Technology Conference(LTC'05)

      Pages: 211-215

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Topic Tracking based on Linguistic Features2005

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc. of the 2nd International Joint Conference on Natural Language Processing(IJCNL'05)

      Pages: 10-21

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Using Category Hierarchies for Correcting Errors in Multi-Labeled Date2005

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc. of the 2nd Language and Technology Conference (LTC' 05)

      Pages: 211-215

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Topic Tracking based on Linguistic Features2005

    • Author(s)
      F.Fukumoto, Y.Suzuki
    • Journal Title

      Proc. of the 2nd International Joint Conference on Natural Language Processing (IJCNL' 05)

      Pages: 10-21

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2008-05-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi