Budget Amount *help |
¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2005: ¥100,000 (Direct Cost: ¥100,000)
Fiscal Year 2004: ¥1,300,000 (Direct Cost: ¥1,300,000)
|
Research Abstract |
Grammatical inference, i.e. automatic synthesis of formal grammars from positive and negative samples, is an important research subject in machine learning. We have been working on learning general context free grammars from sample strings, which is implemented in "Synapse" system. Main features of our approach are incremental learning, rule generation based on bottom-up parsing of positive samples, and search for rule sets. In the term of the project, we improved the system by implementing several novel methods. The most important one is the rule generation process, called "bridging," from the results of parsing for each positive string, which synthesizes production rules that make up any lacking parts of the incomplete derivation tree. To solve the fundamental problem of complexity for learning CFG, we employ methods of searching for non-minimum, semi-optimum sets of rules as well as incremental learning based on related grammars. A search strategy, called serial search, is a method of finding the semi-optimum sets by searching for any additional rules for each positive sample and not to find the minimum rule set for all positive samples as in global search. We also investigated extensions of and applications of our approach to some broader classes of grammars including definite clause grammars (DCG) and logic grammars. Future problems include further improvement of Synapse so that the system can synthesize more complex grammars in shorter time and applications to other area of machine learning including syntactic pattern recognition.
|