Budget Amount *help |
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2014: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2013: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2012: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
|
Outline of Final Research Achievements |
This project aims at construction of unsupervized methods for automatic segmentation/annotation of given texts, a fundamental procedure of natural language processing. In addition to lemmatization, other tasks requring segmentation/annotation are also considered. Three achievements are obtained. First, using compression, we constructed an algorithm for detecting text subparts in other languages than the main text. Through a large scale experiment, the method was shown to work with a high accuracy applicable to text preprocessing. Second, the edit distance procedure was extended by Bayes method, and was applied to aligned corpora, to obtain translation pairs. Third, by use of minimal automaton, the patterns underlying sentences are detected, which serves for defining the segments within the sentence and further grouping of similarly used text parts.
|