Budget Amount *help |
¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)
Fiscal Year 2012: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2011: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2010: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
|
Research Abstract |
The principal investigator has previously developed a tool for the morphological processing of waka poems in 2007. However, its range of applicability was limited to the Hachidaishu. The goal of the present research is to automatically segment and annotate part-of-speech tags for the Nijuichidaishu using the previously annotated segmentation data and token adjacency probabilities of the Hachidaishu. Using the KyTea (Kyoto Text Analysis Toolkit) morpheme segmentation toolkit, with its default L2 regularized SVM learning algorithm, model learning took less than a minute. This model also achieved a high segmentation accuracy of around 96% on the Nijuichidaishu. While there is some remaining work to be done concerning the addition of unknown tokens and the learning of adjacency probabilities around unknown words, the development of a dictionary that can segment the Nijuichidaishu with a high accuracy can be considered complete.
|