Discriminant Analysis with Incomplete Data and its Application to Estimation of Words' Cooccurrency
Project/Area Number |
13680450
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Kyushu University |
Principal Investigator |
TOMIURA Yoichi Kyushu University, Graduate School of Information Science and Electrical Engineering, associate professor, 大学院・システム情報科学研究院, 助教授 (10217523)
|
Co-Investigator(Kenkyū-buntansha) |
TANAKA Shosaku Kyushu University, Computing and Communications Center, assistant, 情報基盤センター, 助手 (00325549)
日高 達 九州大学, 大学院・システム情報科学研究院, 教授 (30037931)
|
Project Period (FY) |
2001 – 2003
|
Project Status |
Completed (Fiscal Year 2003)
|
Budget Amount *help |
¥3,500,000 (Direct Cost: ¥3,500,000)
Fiscal Year 2003: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 2002: ¥2,300,000 (Direct Cost: ¥2,300,000)
Fiscal Year 2001: ¥600,000 (Direct Cost: ¥600,000)
|
Keywords | Words' Cooccurrency / Multiple regression model / Vectorial expression of word / Knowledge Acquisition / Syntactic Disambiguation / Natural Language Processing / 語彙共起性 / 構文構造の暖昧さ解消 / 語のベクトル表現 / 構文解析 / 語の共起性判定 / 選択制約 / 意味獲得 |
Research Abstract |
Words' Cooccurrency is one of the basic knowledge in Natural Language Processing, and it is used for syntactic disambiguation and word sense disambiguation. But the tuples of words which are able to cooccure are too massive to collect all of them even using a huge tagged corpus. This project has proposed a new method for estimating words' cooccurrency with a tagged corpus based on the multiple regression model. Independent variables of this model correspond to satellite words. Unlike the ordinary multiple regression analysis, the independent variables are also parameters of this model and tuples of words which are not observed in the corpus are used as negative data with a degree of confidence. We have experimented on estimation of cooccurrency between Japanese nouns and verbs through postpositional particles using EDR corpus and evaluated the estimated cooccurrencies with the following two ways: 1.the direct evaluation by investigating the distribution of cooccurencies of tupple of words which are not observed in the corpus used for learning but are observed in another corpus, 2.the indirect evaluation by the experiment on syntactic disambiguation using the estimated cooccurrencis.
|
Report
(4 results)
Research Products
(15 results)