Basic research concerning adjacency probabilities in the development of a morphological analysis dictionary for classical Japanese poetry
Project/Area Number |
22520458
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Japanese linguistics
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
|
Project Period (FY) |
2010-04-01 – 2013-03-31
|
Project Status |
Completed (Fiscal Year 2013)
|
Budget Amount *help |
¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)
Fiscal Year 2012: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2011: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2010: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
|
Keywords | 和歌 / 通時分析 / 古語辞書 / 形態素 / ネットワーク分析 / 語彙論 / 連接規則 / 機械学習 / 辞書 / 連接 / 日本語 / 平安時代 / 解析システム / 古語 / 形態素解析 / 古代語辞書 / 通時変化 / 品詞体系 / 八代集 / シソーラス / 通時的言語 / 語彙論的トポロジー |
Research Abstract |
The principal investigator has previously developed a tool for the morphological processing of waka poems in 2007. However, its range of applicability was limited to the Hachidaishu. The goal of the present research is to automatically segment and annotate part-of-speech tags for the Nijuichidaishu using the previously annotated segmentation data and token adjacency probabilities of the Hachidaishu. Using the KyTea (Kyoto Text Analysis Toolkit) morpheme segmentation toolkit, with its default L2 regularized SVM learning algorithm, model learning took less than a minute. This model also achieved a high segmentation accuracy of around 96% on the Nijuichidaishu. While there is some remaining work to be done concerning the addition of unknown tokens and the learning of adjacency probabilities around unknown words, the development of a dictionary that can segment the Nijuichidaishu with a high accuracy can be considered complete.
|
Report
(4 results)
Research Products
(48 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Journal Article] 八代集用語のモデリングシステム2010
Author(s)
山元啓史
-
Journal Title
じんもんこん2010,人文科学とコンピュータシンポジウム,じんもんこん2010,人文科学とコンピュータシンポジウム(情報処理学会)
Volume: Vol. 2010, No. 15
NAID
Related Report
Peer Reviewed
-
-
-
-
-
-
-
-
-
-
[Presentation] 通時コーパスと言語空間論2012
Author(s)
山元啓史, 田中牧郎, 近藤泰弘
Organizer
コーパス日本語学ワークショップ,コーパス日本語学ワークショップ予稿集,国立国語研究所言語資源研究系・コーパス開発センター, Vol. 1, No. 1
Place of Presentation
東京:国立国語研究所
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-