• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Unsupervised Segmentation and Annotation of Texts

Research Project

Project/Area Number 24650065
Research Category

Grant-in-Aid for Challenging Exploratory Research

Allocation TypeMulti-year Fund
Research Field Intelligent informatics
Research InstitutionKyushu University

Principal Investigator

Tanaka-Ishii Kumiko  九州大学, システム情報科学研究科(研究院, 教授 (10323528)

Project Period (FY) 2012-04-01 – 2016-03-31
Project Status Completed (Fiscal Year 2015)
Budget Amount *help
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2014: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2013: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2012: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Keywords自然言語処理 / 形態素解析 / 教師無し学習 / 圧縮 / Bayes手法 / 教師なし機械学習 / パターン抽出 / オートマトン / 文書分割
Outline of Final Research Achievements

This project aims at construction of unsupervized methods for automatic segmentation/annotation of given texts, a fundamental procedure of natural language processing. In addition to lemmatization, other tasks requring segmentation/annotation are also considered. Three achievements are obtained. First, using compression, we constructed an algorithm for detecting text subparts in other languages than the main text. Through a large scale experiment, the method was shown to work with a high accuracy applicable to text preprocessing. Second, the edit distance procedure was extended by Bayes method, and was applied to aligned corpora, to obtain translation pairs. Third, by use of minimal automaton, the patterns underlying sentences are detected, which serves for defining the segments within the sentence and further grouping of similarly used text parts.

Report

(5 results)
  • 2015 Annual Research Report   Final Research Report ( PDF )
  • 2014 Research-status Report
  • 2013 Research-status Report
  • 2012 Research-status Report
  • Research Products

    (5 results)

All 2014 2012

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (1 results) Book (2 results)

  • [Journal Article] Sentence Hedge Detection without Cue Annotation: A Heuristic Cue Selection Approach2014

    • Author(s)
      Andre Horie and Kumiko Tanaka-Ishii
    • Journal Title

      Journal of Natural Language Processing

      Volume: 21 Issue: 1 Pages: 27-40

    • DOI

      10.5715/jnlp.21.27

    • NAID

      130004566464

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2014 Research-status Report
    • Peer Reviewed
  • [Journal Article] Sentencce hedge detection without cue annotation: A heuristic cue selection approach.2014

    • Author(s)
      Andre Horie and Kumiko Tanaka-Ishii
    • Journal Title

      自然言語処理

      Volume: 21 Pages: 24-40

    • Related Report
      2013 Research-status Report
    • Peer Reviewed
  • [Presentation] Text Segmentation by Language Using Minimum Description Length2012

    • Author(s)
      Yamaguchi, Hiroshi and Kumiko Tanaka-Ishii
    • Organizer
      50th Annual Conference of the Association for Computational Linguistics
    • Place of Presentation
      韓国済州島
    • Related Report
      2012 Research-status Report
  • [Book] Language Production, Cognition, and the Lexicon2014

    • Author(s)
      Kumiko Tanaka-Ishii
    • Total Pages
      586
    • Publisher
      Springer
    • Related Report
      2014 Research-status Report
  • [Book] Recent Advances in Language Production, Cognition and the Lexicon2014

    • Author(s)
      Kumiko Tanaka-Ishii
    • Publisher
      Springer
    • Related Report
      2013 Research-status Report

URL: 

Published: 2013-05-31   Modified: 2019-07-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi