2015 Fiscal Year Final Research Report
Unsupervised Segmentation and Annotation of Texts
Project/Area Number |
24650065
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | Kyushu University |
Principal Investigator |
Tanaka-Ishii Kumiko (田中久美子) 九州大学, システム情報科学研究科(研究院, 教授 (10323528)
|
Project Period (FY) |
2012-04-01 – 2016-03-31
|
Keywords | 自然言語処理 / 形態素解析 / 教師無し学習 / 圧縮 / Bayes手法 |
Outline of Final Research Achievements |
This project aims at construction of unsupervized methods for automatic segmentation/annotation of given texts, a fundamental procedure of natural language processing. In addition to lemmatization, other tasks requring segmentation/annotation are also considered. Three achievements are obtained. First, using compression, we constructed an algorithm for detecting text subparts in other languages than the main text. Through a large scale experiment, the method was shown to work with a high accuracy applicable to text preprocessing. Second, the edit distance procedure was extended by Bayes method, and was applied to aligned corpora, to obtain translation pairs. Third, by use of minimal automaton, the patterns underlying sentences are detected, which serves for defining the segments within the sentence and further grouping of similarly used text parts.
|
Free Research Field |
Natural Language Processing
|