|Budget Amount *help
¥1,100,000 (Direct Cost: ¥1,100,000)
Fiscal Year 2002: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 2001: ¥500,000 (Direct Cost: ¥500,000)
1. Development of Sentence Compression Algorithm
The sentence compression problem was formulated as a problem of selecting an optimal subsequence of phrases from a given sentence. Then, based on our dependency analysis technique, an efficient algorithm was developed to solve the problem.
2. Estimation of inter-phrase dependency strength and phrase significance
By using about 34,000 sentences in Kyoto University Corpus, inter-phrase dependency strength was estimated. It is based on the statistical frequency of inter-phrase dependency distance, and was estimated for each modifying phrase class and modified phrase class. Also, a sentence compression experiment was conducted in which human subjects compressed 200 sentences. The result was analyzed statistically and the remaining rate for each phrase class was calculated. Based on the result, phrase significance value for each phrase class was estimated.
3. Subjective Evaluation of Compressed Sentences
A subjective evaluation experiment was perf
ormed for sentences automatically compressed by using the above algorithm together with the estimated inter-phrase dependency and phrase significance. In this experiment, 200 test sentences, which are different from the sentences in 2, were used. 5 subjects were employed for evaluating the quality of compressed sentences. Subjective evaluation was performed from the following points of view : (a) total impression, (b) retention of information, and c grammatical correctness. For comparison, the same kind of evaluation experiment was done for sentences compressed by humans, and also by a random method. It was found that the quality of sentences compressed by the proposed method lies just between those of human compression and random compression.
4. Segmentation of Long Sentences
Because long sentences are difficult to analyze syntactically, it is desirable to segment long sentences into shorter ones. In this work, a support vector machine (SVM) technique was applied to the problem. Vectors consisting of surface attribute values of relevant phrases were input to the SVM, and segmentation points were automatically estimated. As a result, 77% of precision and 84% of recall were obtained. Correct sentence segmentation rate was 72%. Less