2013 Fiscal Year Final Research Report
Basic research concerning adjacency probabilities in the development of a morphological analysis dictionary for classical Japanese poetry
Project/Area Number |
22520458
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Japanese linguistics
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
|
Project Period (FY) |
2010-04-01 – 2013-03-31
|
Keywords | 和歌 / 通時分析 / 古語辞書 / 形態素 / ネットワーク分析 / 語彙論 / 連接規則 / 機械学習 |
Research Abstract |
The principal investigator has previously developed a tool for the morphological processing of waka poems in 2007. However, its range of applicability was limited to the Hachidaishu. The goal of the present research is to automatically segment and annotate part-of-speech tags for the Nijuichidaishu using the previously annotated segmentation data and token adjacency probabilities of the Hachidaishu. Using the KyTea (Kyoto Text Analysis Toolkit) morpheme segmentation toolkit, with its default L2 regularized SVM learning algorithm, model learning took less than a minute. This model also achieved a high segmentation accuracy of around 96% on the Nijuichidaishu. While there is some remaining work to be done concerning the addition of unknown tokens and the learning of adjacency probabilities around unknown words, the development of a dictionary that can segment the Nijuichidaishu with a high accuracy can be considered complete.
|
Research Products
(23 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] 通時コーパスと言語空間論2012
Author(s)
山元啓史, 田中牧郎, 近藤泰弘
Organizer
コーパス日本語学ワークショップ,コーパス日本語学ワークショップ予稿集,国立国語研究所言語資源研究系・コーパス開発センター, Vol. 1, No. 1
Place of Presentation
東京:国立国語研究所
Year and Date
20120305–06
-
-
-
-
-
-