• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of a Concordance-Making System Utilizing an Optical Character Reader

Research Project

Project/Area Number 62450054
Research Category

Grant-in-Aid for General Scientific Research (B)

Allocation TypeSingle-year Grants
Research Field 国語学
Research InstitutionNational Language Research Institute

Principal Investigator

HIDA Yoshifumi  Director, Department of Language Change, NLRI, 言語変化研究部, 部長 (40000418)

Co-Investigator(Kenkyū-buntansha) 加藤 信明  国語辞典編集室, 調査員
SAITOO Hidenori  Head, 3rd Research Section, Department of Computational Linguistics, NLRI, 言語計量研究部第三研究室, 室長 (70000429)
木村 睦子  国立国語研究所, 国語辞典編集室, 室長
見坊 豪紀  国立国語研究所, 国語辞典編集室, 調査員
HAYASHI Ooki  Emeritus Researcher, NLRI, 国語辞典編集室, 名誉所員調査員 (20000002)
KIMURA Mutsuko  Section for Dictionary Research, NLRI
KENBOO Hidetoshi  Section for Dictionary Research, NLRI
KATOO Nobuaki  Section for Dictionary Research, NLRI
Project Period (FY) 1987 – 1988
Project Status Completed (Fiscal Year 1988)
Budget Amount *help
¥6,200,000 (Direct Cost: ¥6,200,000)
Fiscal Year 1988: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 1987: ¥4,200,000 (Direct Cost: ¥4,200,000)
KeywordsOCR / Concordance / 『尋常小学国語読本』 / 『尋常小学国語読』
Research Abstract

The purpose of this study is to develop a system for an efficient making of a concordance by using an optical character reader (OCR).
An OCR reads and identifies hand-written characters on worksheets including Katakana. Alphabet, figures, and other symbols and feed them into a computer. It can process information written on worksheets such as word units, entry words, parts of speech, homonym Id codes, etc.
The next chosen for this study is that of the "Jinjoo Shoogaku Kokugo Tokuhon" or state-compiled elementary school readers used nationwide from 1918 to 1938. The text includes a total of about 100,000 words.
Following word has been completed during the two-year period. (1)Word unit identification for each entry word. (2)Input of style information such as spoken, written, dialog, and verse to each quotation by OCR worksheets. (3)Input of data such as entry words, parts of speech, and homonym ID by OCR worksheets. (4)Readout by the OCR. (5)Correction of data. (6)Programming for processing revised data. (7)Programming for KWIC output. (8)Printout of KWIC lists.

Report

(3 results)
  • 1988 Annual Research Report   Final Research Report Summary
  • 1987 Annual Research Report

URL: 

Published: 1987-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi