• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

1992 Fiscal Year Final Research Report Summary

ON THE OCR APPROACH TO CREATING FULL-TEXT DATA BASE OF JAPANESE CLASSICAL LITERATURE

Research Project

Project/Area Number 04610271
Research Category

Grant-in-Aid for General Scientific Research (C)

Allocation TypeSingle-year Grants
Research Field 国文学
Research InstitutionNational Institute of Japanese Literature

Principal Investigator

HARA Shoichiro  National Institute of Japanese Literature Research Information Department Associate Professor, 研究情報部, 助教授 (50218616)

Project Period (FY) 1992
KeywordsIMAGE PROCESSING / JAPANESE CLASSICAL LITERATURE / OCR / IMAGE CLASSIFICATION / DISCRIMINANT THRESHOLD SELECTION METHOD / CLUSTER ANALYSIS / NOISE REDUCTION
Research Abstract

A new approach to reducing image noises which disturb the optical character recognition has been studied. A peculiarity of the study is to use information about color to improve classification of "true" letters from image noises such as red letters, paper, pseudo-letters which are written on the reverse side of translucent papers and so on. Japanese original classical books written by the Chinese black ink on the white Japanese classical papers were selected as the research samples.
The results are as follows :
(1) Characteristics of Color Distribution : Original images were digitized by the color image scanner (100dpi, 256 gray-levels/R,G,B). and each picture cells are represented as 3-dimensional vector in the RGB-chromaticity coordinates then analyzed. The characteristics of the color distribution are, (a) many of the picture cells have the color distribution along with the line of R=G=B, (b)red letters have the different color distribution from (a), (c) brightness histograms of R,G and B colors are almost bimodal.
(2) Classification of Images : (a) The characteristic of (a) and (b) in (1) are useful to distinguish red letters from another images. (b) The discriminant threshold selection method (Ohtu's method) was applied to each brightness histograms to determine thresholds between black letters and paper segments. This method can classify both segments sharply, but it is inclined to slices off the peripheral picture cells of the "true" black letters. (c) The cluster analysis was introduced to classify "true" black letters and paper segments more precisely, which gives better result.
This study verify usefulness of the color information to eliminate image noise.

URL: 

Published: 1994-03-24  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi