Project/Area Number |
62450054
|
Research Category |
Grant-in-Aid for General Scientific Research (B)
|
Allocation Type | Single-year Grants |
Research Field |
国語学
|
Research Institution | National Language Research Institute |
Principal Investigator |
HIDA Yoshifumi Director, Department of Language Change, NLRI, 言語変化研究部, 部長 (40000418)
|
Co-Investigator(Kenkyū-buntansha) |
加藤 信明 国語辞典編集室, 調査員
SAITOO Hidenori Head, 3rd Research Section, Department of Computational Linguistics, NLRI, 言語計量研究部第三研究室, 室長 (70000429)
木村 睦子 国立国語研究所, 国語辞典編集室, 室長
見坊 豪紀 国立国語研究所, 国語辞典編集室, 調査員
HAYASHI Ooki Emeritus Researcher, NLRI, 国語辞典編集室, 名誉所員調査員 (20000002)
KIMURA Mutsuko Section for Dictionary Research, NLRI
KENBOO Hidetoshi Section for Dictionary Research, NLRI
KATOO Nobuaki Section for Dictionary Research, NLRI
|
Project Period (FY) |
1987 – 1988
|
Project Status |
Completed (Fiscal Year 1988)
|
Budget Amount *help |
¥6,200,000 (Direct Cost: ¥6,200,000)
Fiscal Year 1988: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 1987: ¥4,200,000 (Direct Cost: ¥4,200,000)
|
Keywords | OCR / Concordance / 『尋常小学国語読本』 / 『尋常小学国語読』 |
Research Abstract |
The purpose of this study is to develop a system for an efficient making of a concordance by using an optical character reader (OCR). An OCR reads and identifies hand-written characters on worksheets including Katakana. Alphabet, figures, and other symbols and feed them into a computer. It can process information written on worksheets such as word units, entry words, parts of speech, homonym Id codes, etc. The next chosen for this study is that of the "Jinjoo Shoogaku Kokugo Tokuhon" or state-compiled elementary school readers used nationwide from 1918 to 1938. The text includes a total of about 100,000 words. Following word has been completed during the two-year period. (1)Word unit identification for each entry word. (2)Input of style information such as spoken, written, dialog, and verse to each quotation by OCR worksheets. (3)Input of data such as entry words, parts of speech, and homonym ID by OCR worksheets. (4)Readout by the OCR. (5)Correction of data. (6)Programming for processing revised data. (7)Programming for KWIC output. (8)Printout of KWIC lists.
|