2001 Fiscal Year Final Research Report Summary
A theoretical and practical investigation to construct a lexicon for analyzing German text databases.
Project/Area Number |
09610522
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
独語・独文学
|
Research Institution | GIFU KEIZAI UNIVERSITY |
Principal Investigator |
YAMADA Yoshihisa GIFU KEIZAI UNIVERSITY, Faculty of Business Administration, Professor, 経営学部, 教授 (50192406)
|
Project Period (FY) |
1997 – 2000
|
Keywords | natural language processing / corpus / text database / software |
Research Abstract |
In order to obtain morph-syntactic information such as a part of speech from a plain corpus, it is indispensable to parse the syntactic structure to some extent. For this purpose, this research conceived a lexicon to analyze text databases and realized this as software. The data of Grimm's fairy tales were used as a basic material. The concrete results achieved by this research are as follows. 1. Continuation of the Grimm corpus Digital processing of the 1812 and 1819 editions of the fairy tales of the Brothers Grimm was performed. These data were reorganized as a Grimm corpus which also included the existing 1857 edition. It is relatively small in scale, but could be called the first diachronic corpus of German. 2. Completion of the lemma frequency list The lemma frequency list of the 1857 edition that contains more than 220,000 words was completed. Compared with a simple word frequency list, a lemma frequency list is an intricate work especially in the case of inflectional languages such as German. It is therefore an innovative experiment, and can be valuable in various areas, such as linguistics, lexicography, stylistics, etc. 3. Completion of the corpus analyzing software TEDDY II The software TEDDY II that implemented ASA (=Auflosungsstrategie der Ambiguitat, strategy to resolve the ambiguity) was completed. The user interface and the display of the output were also improved, compared with the previous version of TEDDY.
|