Unsupervised Segmentation and Annotation of Texts
Project/Area Number |
24650065
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | Kyushu University |
Principal Investigator |
|
Project Period (FY) |
2012-04-01 – 2016-03-31
|
Project Status |
Completed (Fiscal Year 2015)
|
Budget Amount *help |
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2014: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2013: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2012: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
|
Keywords | 自然言語処理 / 形態素解析 / 教師無し学習 / 圧縮 / Bayes手法 / 教師なし機械学習 / パターン抽出 / オートマトン / 文書分割 |
Outline of Final Research Achievements |
This project aims at construction of unsupervized methods for automatic segmentation/annotation of given texts, a fundamental procedure of natural language processing. In addition to lemmatization, other tasks requring segmentation/annotation are also considered. Three achievements are obtained. First, using compression, we constructed an algorithm for detecting text subparts in other languages than the main text. Through a large scale experiment, the method was shown to work with a high accuracy applicable to text preprocessing. Second, the edit distance procedure was extended by Bayes method, and was applied to aligned corpora, to obtain translation pairs. Third, by use of minimal automaton, the patterns underlying sentences are detected, which serves for defining the segments within the sentence and further grouping of similarly used text parts.
|
Report
(5 results)
Research Products
(5 results)