Studies on OCR for Historical Document

Research Project

Project/Area Number	11410090
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Japanese history
Research Institution	Osaka City University
Principal Investigator	SHIBAYAMA Mamoru Osaka City Univ., Media Center, Professor, 学術情報総合センター, 教授 (10162645)
Co-Investigator(Kenkyū-buntansha)	NAMIKI Mitaro Tokyo University of Agriculture and Technology, Faculty of Engineering, Associate Prof., 工学部, 助教授 (10208077) TSUKADA Takashi Osaka City Univ., Faculty of Literature, Associate Prof., 大学院・文学研究科, 教授 (60126125) YAMADA Shoji International Research Center for Japanese Studies, Research Division, Associate Prof., 研究部, 助教授 (20248751) HOSHINO Satoshi Kyoto Univ., Professor of Emeritus, 名誉教授 (90025867) KAWAGUCHI Hiroshi Tezukayama Univ., Faculty of Information and Management, Associate Prof., 経営情報学部, 助教授 (80224749) 大島真理夫大阪市立大学, 経済学部, 教授 (30128730)
Project Period (FY)	1999 – 2001
Project Status	Completed (Fiscal Year 2001)
Budget Amount *help	¥5,900,000 (Direct Cost: ¥5,900,000) Fiscal Year 2001: ¥1,400,000 (Direct Cost: ¥1,400,000) Fiscal Year 2000: ¥2,400,000 (Direct Cost: ¥2,400,000) Fiscal Year 1999: ¥2,100,000 (Direct Cost: ¥2,100,000)
Keywords	Historical Document Images / OCR / Character Recognition / Character Segmentation / Recognition Dictionary / Transliteration / 古文書認識 / 古文書翻刻支援 / 近世文書 / 自動読み取り
Research Abstract	The purpose of this research is a trial study which try to develop an OCR (In the research, it is interpreted as an automatic recognition) for recognizing the historical document image at the early modern age, elucidating the mechanism in the character recognition of the historical document with cursive styles using writing brush. Also, the research is to focus on a new aspect in Japanese historical studies by introducing and supporting of a basic and limited character recognition system. The research results are as follows. (1) In the building of the dictionary for recognizing characters, the character segmentation from the document and the related computer programs for segmenting it are carried out. (2) In a basic research on the segmentation and the recognition of the historical document character the recognition of the layout of document image and the automatic extraction of the title of document had carried out. In the experiment for recognizing characters, a new system without the segmentation of cursive characters was introduced. (3) Supporting the transliteration of the document, the n-gram method was used and its effectiveness was confirmed. . .. (4) In the historical document character recognition process, it was found to increase the similarity in the regularizing operation in recognizing process. Then, a newly system must be researched for the next stage. (5) The character database focus on the title of document had developed. This database, which the number of titles has about 900 titles and 192 kinds of the characters, has been opened. In detail, refer the research report "Research of the historical document transcription support system (1) and (2) are published in March, 2000 and in March, 2001 respectively.

Report

(4 results)

2001 Annual Research Report Final Research Report Summary
2000 Annual Research Report
1999 Annual Research Report

Research Products
(16 results)

All Other

All Publications (16 results)

[Publications] 富田宏章, 柴山守他: "古文書画像の2値化レベル制御による対話型文字分割とその評価"電気学会論文誌C. 118・C・4. 503-509 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] 山田奨治, 柴山守: "n-gramによる古文書証文類翻刻支援の検討"情報処理学会人文科学とコンピュータシンポジウム論文集. 2000・17. 185-192 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] 尾崎浩司, 柴山守他: "古文書画像の標題文字セグメンテーション"情報処理学会人文科学とコンピュータシンポジウム論文集. 2000・17. 279-286 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Hiroaki TOMITA, Mamoru SHIBAYAMA et al.: "Interactive Character Segmentation of Ancient Documents by Controlling Binary Level and its Evalation"Trans. IEE of Japan. Vol.118-C, No.4. 503-509 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Shoji YAMADA, Mamoru SHIBAYAMA: "A study of a Historical document research supporting system using n-gram"IPSJ Symposium Series. Vol.2000, No.17. 185-192 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Koji Ozaki, Mamoru SHIBAYAMA et al.: "Title Character Segmentation for Historical Document Images"IPSJ Symposium Series. Vol.2000, No.17. 279-286 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] 山田奨治, 柴山守他: "古文書翻刻支援システム開発プロジェクト報告(2)"情報処理学会研究報告 2001-CH-50. 2001・5. 9-15 (2001)
- Related Report
  2001 Annual Research Report
[Publications] 山田奨治, 柴山守他: "類似文字検索機能をそなえた電子くずし字辞典の開発"情報処理学会研究報告 2002-CH-54. (予定). (2002)
- Related Report
  2001 Annual Research Report
[Publications] 山田奨治,柴山守他: "古文書翻刻支援システム開発プロジェクト報告(1)"情報処理学会研究報告2000-CH-45. 2000・5. 1-8 (2000)
- Related Report
  2000 Annual Research Report
[Publications] 和泉勇治,加藤寧他: "ニューラルネットワークを用いた古文書文字認識に関する一検討"情報処理学会研究報告2000-CH-45. 2000・5. 9-15 (2000)
- Related Report
  2000 Annual Research Report
[Publications] 山田奨治,柴山守他: "n-gramによる古文書証文類翻刻支援の検討"人文科学とコンピュータシンポジウム2000論文集. (2000)
- Related Report
  2000 Annual Research Report
[Publications] 尾崎浩司,柴山守他: "古文書画像のレイアウト認識と標題抽出"情報処理学会研究報告2000-CH-47. 2000・67. 47-54 (2000)
- Related Report
  2000 Annual Research Report
[Publications] 尾崎浩司,柴山守他: "古文書画像の標題文字セグメンテーション"人文科学とコンピュータシンポジウム2000論文集. (2000)
- Related Report
  2000 Annual Research Report
[Publications] 尾崎浩司、柴山守、荒木義彦: "古文書レイアウト画像のピラミット型抽象化と標題の自動抽出"電気関係学会関西支部連合大会論文集 G12-6. G266 (1999)
- Related Report
  1999 Annual Research Report
[Publications] 山田奨治他: "古文書翻刻支援システム開発プロジェクト報告(1)"情報処理学会研究報告 2000-CH-45. 2000.8. 1-8 (2000)
- Related Report
  1999 Annual Research Report
[Publications] 和泉勇治、加藤寧他: "ニューラルネットワークを用いた古文書個別文字認識に関する一検討"情報処理学会研究報告 2000-CH-45. 2000.8. 9-15 (2000)
- Related Report
  1999 Annual Research Report

Studies on OCR for Historical Document

Principal Investigator

SHIBAYAMA Mamoru Osaka City Univ., Media Center, Professor, 学術情報総合センター, 教授 (10162645)

¥5,900,000 (Direct Cost: ¥5,900,000)

Report

Research Products

[Publications] 富田宏章, 柴山 守他: "古文書画像の2値化レベル制御による対話型文字分割とその評価"電気学会論文誌C. 118・C・4. 503-509 (1999)

Description

Related Report

[Publications] 山田奨治, 柴山 守: "n-gramによる古文書証文類翻刻支援の検討"情報処理学会人文科学とコンピュータシンポジウム論文集. 2000・17. 185-192 (2000)

Description

Related Report

[Publications] 尾崎浩司, 柴山 守他: "古文書画像の標題文字セグメンテーション"情報処理学会人文科学とコンピュータシンポジウム論文集. 2000・17. 279-286 (2000)

Description

Related Report

[Publications] Hiroaki TOMITA, Mamoru SHIBAYAMA et al.: "Interactive Character Segmentation of Ancient Documents by Controlling Binary Level and its Evalation"Trans. IEE of Japan. Vol.118-C, No.4. 503-509 (1999)

Description

Related Report

[Publications] Shoji YAMADA, Mamoru SHIBAYAMA: "A study of a Historical document research supporting system using n-gram"IPSJ Symposium Series. Vol.2000, No.17. 185-192 (2000)

Description

Related Report

[Publications] Koji Ozaki, Mamoru SHIBAYAMA et al.: "Title Character Segmentation for Historical Document Images"IPSJ Symposium Series. Vol.2000, No.17. 279-286 (2000)

Description

Related Report

[Publications] 山田奨治, 柴山 守他: "古文書翻刻支援システム開発プロジェクト報告(2)"情報処理学会研究報告 2001-CH-50. 2001・5. 9-15 (2001)

Related Report

[Publications] 山田奨治, 柴山 守他: "類似文字検索機能をそなえた電子くずし字辞典の開発"情報処理学会研究報告 2002-CH-54. (予定). (2002)

Related Report

[Publications] 山田奨治,柴山守 他: "古文書翻刻支援システム開発プロジェクト報告(1)"情報処理学会研究報告2000-CH-45. 2000・5. 1-8 (2000)

Related Report

[Publications] 和泉勇治,加藤寧 他: "ニューラルネットワークを用いた古文書文字認識に関する一検討"情報処理学会研究報告2000-CH-45. 2000・5. 9-15 (2000)

Related Report

[Publications] 山田奨治,柴山守 他: "n-gramによる古文書証文類翻刻支援の検討"人文科学とコンピュータシンポジウム2000論文集. (2000)

Related Report

[Publications] 尾崎浩司,柴山守 他: "古文書画像のレイアウト認識と標題抽出"情報処理学会研究報告2000-CH-47. 2000・67. 47-54 (2000)

Related Report

[Publications] 尾崎浩司,柴山守 他: "古文書画像の標題文字セグメンテーション"人文科学とコンピュータシンポジウム2000論文集. (2000)

Related Report

[Publications] 尾崎浩司、柴山守、荒木義彦: "古文書レイアウト画像のピラミット型抽象化と標題の自動抽出"電気関係学会関西支部連合大会論文集 G12-6. G266 (1999)

Related Report

[Publications] 山田奨治 他: "古文書翻刻支援システム開発プロジェクト報告(1)"情報処理学会研究報告 2000-CH-45. 2000.8. 1-8 (2000)

Related Report

[Publications] 和泉勇治、加藤 寧 他: "ニューラルネットワークを用いた古文書個別文字認識に関する一検討"情報処理学会研究報告 2000-CH-45. 2000.8. 9-15 (2000)

Related Report

[Publications] 富田宏章, 柴山守他: "古文書画像の2値化レベル制御による対話型文字分割とその評価"電気学会論文誌C. 118・C・4. 503-509 (1999)

[Publications] 山田奨治, 柴山守: "n-gramによる古文書証文類翻刻支援の検討"情報処理学会人文科学とコンピュータシンポジウム論文集. 2000・17. 185-192 (2000)

[Publications] 尾崎浩司, 柴山守他: "古文書画像の標題文字セグメンテーション"情報処理学会人文科学とコンピュータシンポジウム論文集. 2000・17. 279-286 (2000)

[Publications] 山田奨治, 柴山守他: "古文書翻刻支援システム開発プロジェクト報告(2)"情報処理学会研究報告 2001-CH-50. 2001・5. 9-15 (2001)

[Publications] 山田奨治, 柴山守他: "類似文字検索機能をそなえた電子くずし字辞典の開発"情報処理学会研究報告 2002-CH-54. (予定). (2002)

[Publications] 山田奨治,柴山守他: "古文書翻刻支援システム開発プロジェクト報告(1)"情報処理学会研究報告2000-CH-45. 2000・5. 1-8 (2000)

[Publications] 和泉勇治,加藤寧他: "ニューラルネットワークを用いた古文書文字認識に関する一検討"情報処理学会研究報告2000-CH-45. 2000・5. 9-15 (2000)

[Publications] 山田奨治,柴山守他: "n-gramによる古文書証文類翻刻支援の検討"人文科学とコンピュータシンポジウム2000論文集. (2000)

[Publications] 尾崎浩司,柴山守他: "古文書画像のレイアウト認識と標題抽出"情報処理学会研究報告2000-CH-47. 2000・67. 47-54 (2000)

[Publications] 尾崎浩司,柴山守他: "古文書画像の標題文字セグメンテーション"人文科学とコンピュータシンポジウム2000論文集. (2000)

[Publications] 山田奨治他: "古文書翻刻支援システム開発プロジェクト報告(1)"情報処理学会研究報告 2000-CH-45. 2000.8. 1-8 (2000)

[Publications] 和泉勇治、加藤寧他: "ニューラルネットワークを用いた古文書個別文字認識に関する一検討"情報処理学会研究報告 2000-CH-45. 2000.8. 9-15 (2000)