• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2015 Fiscal Year Final Research Report

Unsupervised Segmentation and Annotation of Texts

Research Project

  • PDF
Project/Area Number 24650065
Research Category

Grant-in-Aid for Challenging Exploratory Research

Allocation TypeMulti-year Fund
Research Field Intelligent informatics
Research InstitutionKyushu University

Principal Investigator

Tanaka-Ishii Kumiko (田中久美子)  九州大学, システム情報科学研究科(研究院, 教授 (10323528)

Project Period (FY) 2012-04-01 – 2016-03-31
Keywords自然言語処理 / 形態素解析 / 教師無し学習 / 圧縮 / Bayes手法
Outline of Final Research Achievements

This project aims at construction of unsupervized methods for automatic segmentation/annotation of given texts, a fundamental procedure of natural language processing. In addition to lemmatization, other tasks requring segmentation/annotation are also considered. Three achievements are obtained. First, using compression, we constructed an algorithm for detecting text subparts in other languages than the main text. Through a large scale experiment, the method was shown to work with a high accuracy applicable to text preprocessing. Second, the edit distance procedure was extended by Bayes method, and was applied to aligned corpora, to obtain translation pairs. Third, by use of minimal automaton, the patterns underlying sentences are detected, which serves for defining the segments within the sentence and further grouping of similarly used text parts.

Free Research Field

Natural Language Processing

URL: 

Published: 2017-05-10  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi