Studies on OCR for Historical Document
Grant-in-Aid for Scientific Research (B)
|Allocation Type||Single-year Grants |
|Research Institution||Osaka City University |
SHIBAYAMA Mamoru Osaka City Univ., Media Center, Professor, 学術情報総合センター, 教授 (10162645)
NAMIKI Mitaro Tokyo University of Agriculture and Technology, Faculty of Engineering, Associate Prof., 工学部, 助教授 (10208077)
TSUKADA Takashi Osaka City Univ., Faculty of Literature, Associate Prof., 大学院・文学研究科, 教授 (60126125)
YAMADA Shoji International Research Center for Japanese Studies, Research Division, Associate Prof., 研究部, 助教授 (20248751)
HOSHINO Satoshi Kyoto Univ., Professor of Emeritus, 名誉教授 (90025867)
KAWAGUCHI Hiroshi Tezukayama Univ., Faculty of Information and Management, Associate Prof., 経営情報学部, 助教授 (80224749)
大島 真理夫 大阪市立大学, 経済学部, 教授 (30128730)
|Project Period (FY)
1999 – 2001
Completed (Fiscal Year 2001)
|Budget Amount *help
¥5,900,000 (Direct Cost: ¥5,900,000)
Fiscal Year 2001: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2000: ¥2,400,000 (Direct Cost: ¥2,400,000)
Fiscal Year 1999: ¥2,100,000 (Direct Cost: ¥2,100,000)
|Keywords||Historical Document Images / OCR / Character Recognition / Character Segmentation / Recognition Dictionary / Transliteration / 古文書認識 / 古文書翻刻支援 / 近世文書 / 自動読み取り|
The purpose of this research is a trial study which try to develop an OCR (In the research, it is interpreted as an automatic recognition) for recognizing the historical document image at the early modern age, elucidating the mechanism in the character recognition of the historical document with cursive styles using writing brush. Also, the research is to focus on a new aspect in Japanese historical studies by introducing and supporting of a basic and limited character recognition system.
The research results are as follows.
(1) In the building of the dictionary for recognizing characters, the character segmentation from the document and the related computer programs for segmenting it are carried out.
(2) In a basic research on the segmentation and the recognition of the historical document character the recognition of the layout of document image and the automatic extraction of the title of document had carried out. In the experiment for recognizing characters, a new system without the segmentation of cursive characters was introduced.
(3) Supporting the transliteration of the document, the n-gram method was used and its effectiveness was confirmed. . ..
(4) In the historical document character recognition process, it was found to increase the similarity in the regularizing operation in recognizing process. Then, a newly system must be researched for the next stage.
(5) The character database focus on the title of document had developed. This database, which the number of titles has about 900 titles and 192 kinds of the characters, has been opened.
In detail, refer the research report "Research of the historical document transcription support system (1) and (2) are published in March, 2000 and in March, 2001 respectively.
Report (4 results)
Research Products (16 results)