1998 Fiscal Year Final Research Report Summary
Grammar Formalism with Self-Productivity and Development of Superhigh-speed Parsers for the Grammar Formalism
Project/Area Number |
08408009
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
TSUJII Junichi The University of Tokyo, Graduate School of Science, Professor, 大学院・理学系研究科, 教授 (20026313)
|
Co-Investigator(Kenkyū-buntansha) |
TORISAWA Kentaro The University of Tokyo, Graduate School of Science, Research Associate, 大学院・理学系研究科, 助手 (70282712)
|
Project Period (FY) |
1996 – 1998
|
Keywords | natural language processing / parsing algorithm / HPSG grammar / knowledge acquisition / subcategorization frame generation / semantic classification |
Research Abstract |
The aim of this research project was to develop parsers and grammars for Natural Languages such that a parsing process of sentences of natural languages and a knowledge acquisition process from the sentences are treated as the same process. We used Head-driven Phrase Structure Grammar (HPSG) as a core grammar formalism in this project. In order to obtain a practical parser for HPSG, we have designed a programming language- LiLFeS and developed two different types of compilers, namely, a byte-code compiler and a native-code compiler. The parsers written in this programming language was quite efficient. But their parsing speed became 2-10 times faster by introducing a compilation technique from HSPG to Context Free Grammar (CFG). The final version of the parsers could parse a sentence in Japanese newspapers in 250msec in average. We also developed grammars for English and Japanese. The Japanese grammar could analyze 98% of sentences in newspaper articles. So called "Kakari-Uke" precision was around 80%. The English grammar was built by translating XTAG, a large English grammar in the tree adjioining grammar formalism. For our final goal, knowledge acquisition from texts, we have developed two techniques. The first is a technique to acquire subcategorization frames from unparsed texts, and the second is a technique to categorize words into semantic classes by using statistical technique and the results of parsing.
|
Research Products
(12 results)