2002 Fiscal Year Final Research Report Summary
Study on Integration of Statistical Information and Linguistic Constraint Information
Project/Area Number |
12480089
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | NARA INSTITUTE OF SCIENCE AND TECHNOLOGY |
Principal Investigator |
MATSUMOTO Yuji Nara Institute of Science and Technology, Grad School of Informatin Science, professor, 情報科学研究科, 教授 (10211575)
|
Co-Investigator(Kenkyū-buntansha) |
OHTANI Akira Osaka Gakuin University, Faculty of Informatics, lecturer, 情報学部, 講師 (50283817)
MIYAMOTO Edson Nara Institute of Science and Technology, Grad School of Informatin Science, assistant professor, 情報科学研究科, 助手 (60335479)
INUI Kentaro Nara Institute of Science and Technology, Grad School of Informatin Science, associate professor, 情報科学研究科, 助教授 (60272689)
MIYATA Takashi Nara Institute of Science and Technology, Grad School of Informatin Science, assistant professor (currently : National Institute of Advanced Industorial Science and Technology researcher), 情報科学研究科(現産業技術総合研究所), 助手(研究員) (00283929)
|
Project Period (FY) |
2000 – 2002
|
Keywords | Head-driven Phrase Structure Grammar / Constraint-based Grammar Formalism / Dependency Analysis / Morphological Analysis / Statistical Natural Language Processing / Machine Learning / Support Vector Machines / Integration of Statistical and Constraint Information |
Research Abstract |
Along with the increase of machine readable linguistic data, statistical natural language processing has been actively researched. However, most of the statistical natural language processing aims at surface language processing, and is not appropriate to detailed semaintical language analysis. On the other hand, constraint-base grammar formalisms such as Head-driven Phrase Structure Grammar attempt to describe linguistic phenomena as lexical knowledge and most of the linguistic constraints are presented in the lexicon. While such a grammar formalism specifies complicated linguistic information in a very modular way, they have a drawback that any input that violate linguistic constraints cannot be parsed in any way. This research aimed at compensating drawback of both approaches by integrating both mechanisms : We first implemented a rubust and high-quality word-based dependency analysis of sentences using statistical information. Then the constraint-based grammar formalism receiving the output of statistical dependency information, finds out possible interpretation according to the dependency structure. To achieve a robust language processing, we implemented a constraint relaxing mechanism. We implemented the idea of type coersion and co-composition proposed in Generative Lexicon as well as an user interface to browse the intermediate processing information. As for dependency analysis, we utilized Support Vector Machines so as to cope with a large scale feature space, and devised a deterministic bottom-up parsing algorithm for Japanese and English. We implemented a part of Japanese grammar based on Head-driven Phrase Structure Grammar. Those statistical and constraint-based grammar and parser are runnable in the user-inteface we developed to be used for the grammar developpers and the users of the natural language processing system.
|
Research Products
(18 results)