Computerized Database of English Sentences and Its Statistic Analysis Pertaining to Instruction for the Development of Fundamental Reading Ability in Various Academic Fields

Research Project

Project/Area Number	05680194
Research Category	Grant-in-Aid for General Scientific Research (C)
Allocation Type	Single-year Grants
Research Field	教科教育
Research Institution	Yamagata University
Principal Investigator	OKADA Takeshi Yamagata University, 教養部, 助教授 (30185441)
Project Period (FY)	1993 – 1994
Project Status	Completed (Fiscal Year 1994)
Budget Amount *help	¥1,400,000 (Direct Cost: ¥1,400,000) Fiscal Year 1994: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 1993: ¥600,000 (Direct Cost: ¥600,000)
Keywords	Database / Personal Computer / Reading Ability / Computational English Linguistics / Machine-readable dictionary / Statistic Analysis / Computer program / Ministry of Education / 英語データベース / 自然言語構文解析 / コーパス処理 / 文字認識 / ジャンル別の英語表現 / 英語動詞分布
Research Abstract	Throughout the research within two years we have obtained two main achievements as follows : Construction of the Database (1) We develop a high-speed and high-accuracy character recognition system. This system is established through various computer programs and machine-readable dictionaries which support the deficiencies of the commercial optical-character-recognition software. Our system "recognizes" English character strings not from the data transferred directly from the image-scanner, but from the image-datafiles obtained via other software. With this system, we can obtain a large amount of fundamental English sentence data with greater efficiency. (2) The introduction of the computer network system enables us to obtain quite easily the fundamental English data. (3) The sentence data are correctly formatted into a uniform databasestyle (1 record/1 sentence) suitable for the following computational processings. Analyzes of the Database (1) The "concordance processing" is possible for the database-sentences. At the same time, we investigate the pattern of usage of the English verbs which appears in the textbooks for Japanese junior-high schools. (2) We build up a complex automatic tag assignment program named TAGASS (TAG ASSigner) which annotates possible grammatical marker (s) to every word in the database-sentences. (3) The program which picks up only one preferable tag from the set of tag-candidates is completed. This program, which realizes more than 80% accuracy, is written under the probabilisticstatistic algorithm. We will investigate the possibillity of the ideal transaction between this algorithm and the grammar-based algorithm in future research.