Budget Amount *help |
¥2,200,000 (Direct Cost: ¥2,200,000)
Fiscal Year 2003: ¥1,100,000 (Direct Cost: ¥1,100,000)
Fiscal Year 2002: ¥1,100,000 (Direct Cost: ¥1,100,000)
|
Research Abstract |
The main aim of the research is to develop a practical semantic analysis system SAGE. For this aim, we attempted to improve the performance both in precision and speed based on the original prototype. In regard to the precision, we made four efforts to improve it. (1) In addition to the statistical measures calculated from the EDR corpus to determine the deep case among words, we supplement a rule-based procedure to determine the case between two words, based on the information on particles, parts of speech, and word meanings. (2) We contrived to determine the deep case between an indeclinable word and its modifier. (3) It has been made possible to analyze the deep case of unregistered words in the EDR Dictionary by replacing them with the registered words holding similar concepts. (4) We classified the expressions containing brackets into 3 categories, supplementation, commission, and complement, and add the analysis of brackets into SAGE. The present system employing the above techniques has reached a precision of 90.2% on word meaning, 90.0% on deep case for the sentences from EDR Corpus, and 87.0% and 86.8% respectively for news articles from the Internet. In regard to the speed-up, we devised a linear-order algorithm including two steps for deciding word meaning and deep case. We first see how the modified word impacts the meaning of the modifier from bottom up, and then represent it with a probability. After that, we determine the meaning of each clause with a top-down approach. We implemented a speed-up of about 10 thousand times with the algorithm. Besides, we reduced the time required for accessing the EDR Dictionary to 1 fifth of the original time, and converted the whole system from a Prolog version to a C version and hence realized another speed-up of 5 times.
|