Project/Area Number |
08408009
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
TSUJII Junichi The University of Tokyo, Graduate School of Science, Professor, 大学院・理学系研究科, 教授 (20026313)
|
Co-Investigator(Kenkyū-buntansha) |
TORISAWA Kentaro The University of Tokyo, Graduate School of Science, Research Associate, 大学院・理学系研究科, 助手 (70282712)
|
Project Period (FY) |
1996 – 1998
|
Project Status |
Completed (Fiscal Year 1998)
|
Budget Amount *help |
¥22,100,000 (Direct Cost: ¥22,100,000)
Fiscal Year 1998: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 1997: ¥9,800,000 (Direct Cost: ¥9,800,000)
Fiscal Year 1996: ¥10,300,000 (Direct Cost: ¥10,300,000)
|
Keywords | natural language processing / parsing algorithm / HPSG grammar / knowledge acquisition / subcategorization frame generation / semantic classification / 文法フォーマリズム / 単一化文法 / 論理型プログラミング言語 / 構文解析 / 機械学習 / 統計処理 / 文法学習 / 構文解析システム / 文法 / 素性構造記述言語 |
Research Abstract |
The aim of this research project was to develop parsers and grammars for Natural Languages such that a parsing process of sentences of natural languages and a knowledge acquisition process from the sentences are treated as the same process. We used Head-driven Phrase Structure Grammar (HPSG) as a core grammar formalism in this project. In order to obtain a practical parser for HPSG, we have designed a programming language- LiLFeS and developed two different types of compilers, namely, a byte-code compiler and a native-code compiler. The parsers written in this programming language was quite efficient. But their parsing speed became 2-10 times faster by introducing a compilation technique from HSPG to Context Free Grammar (CFG). The final version of the parsers could parse a sentence in Japanese newspapers in 250msec in average. We also developed grammars for English and Japanese. The Japanese grammar could analyze 98% of sentences in newspaper articles. So called "Kakari-Uke" precision was around 80%. The English grammar was built by translating XTAG, a large English grammar in the tree adjioining grammar formalism. For our final goal, knowledge acquisition from texts, we have developed two techniques. The first is a technique to acquire subcategorization frames from unparsed texts, and the second is a technique to categorize words into semantic classes by using statistical technique and the results of parsing.
|