• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Studies on Development of OCR system for Historical Documents and Application to Technologies in Electronic Dictionary

Research Project

Project/Area Number 12558037
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section展開研究
Research Field 情報システム学(含情報図書館学)
Research InstitutionOsaka City University

Principal Investigator

SHIBAYAMA Mamoru  Osaka City Univ., Media Center, Professor, 学術情報総合センター, 教授 (10162645)

Co-Investigator(Kenkyū-buntansha) NAMIKI Mitaro  Tokyo University of Agriculture and Technology, Faculty of Engineering, Associate Prof, 工学部, 助教授 (10208077)
HARA Shoichiro  National Institute of Japanese Literature, Associate Prof, 研究情報学部, 助教授 (50218616)
YAMADA Shoji  International Research Center for Japanese Studies, Research Division, Associate Prof, 研究部, 助教授 (20248751)
IWASAKI Hiroshi  Kyoto Univ., Professor of Emeritus, コミュニティ振興学部, 教授 (50087904)
KAWAGUCHI Hiroshi  Tezukayama Univ., Faculty of Information and Management, Associate Prof, 経営情報学部, 助教授 (80224749)
Project Period (FY) 2000 – 2002
Project Status Completed (Fiscal Year 2002)
Budget Amount *help
¥5,800,000 (Direct Cost: ¥5,800,000)
Fiscal Year 2002: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 2001: ¥3,800,000 (Direct Cost: ¥3,800,000)
KeywordsHistorical Document Images / OCR / Chracter Recognition / Chracter Segmentation / Recognition Dictionary / Transliteration / 古文書解読支援
Research Abstract

The purpose of this research is to build the electronic dictionaries, "Kuzushiji Kaidoku dictionary" and "Kuzushiji Yourei dictionary", used in which the specialist of the historical study, paleography, and literature deciphers the historical handwritten documents using the computer including mobil and note book styles, and to develop the computerized dictionary that can be used in a mobil environment.
Moreover, it is to apply the dictionaries directly to the character recognition researches in the transliteration supporting system for historical documents (Historical document OCR) mentioned above.
The following research results were obtained during this reserch period.
(1) The images which is the index of "Kuzushiji Yourei dictionary" (it allows us to retrieve the shape of letters and examples of letter use based on the stroke (Kihitsu-jun) index) were input as the images with attributes such as "Kuzushiji Yourei dictionary code", "Mojikyo code" and "Shift-JIS" internal code, and an elec … More tronic Moji database was built
(2) A retrieval function which the user can search the similar characters in the above-mentioned dictionary was developed
(3) The "n-gram" method was applied to the researches in the historical document transliteration supporting system (historical document OCR), and it was confirmed that "n-gram" was effective when the lost or missing charahter in the document was presumed
(4) To build the character pattern dictionary of about 240,000 characters on the historical document to be used in the recognition process, a development of segmentation program and the character selection work were carried out
(5) The second edition of HCD series below in the historical document character database had been made as one of computerized dictionaries. (a) HCD2, title line for debt bond, Fushimiya Zenbei document, 200 lines, 1,378 characters, and binary format. (b) HCD2a, title line for the bond, Fushimiya Zenbei document, 200 lines, 1,378 characters, and 256 steps. c HCD2b, title line for debt bond, Fushimiya Zenbei document, 200 lines, and 24bits 1,378 character colors format. (d) HCD3, title line for debt bond, Fushimiya Zenbei document, 183 character types, 4933 characters, and binary format
(6) The character recognition in the document focused on the title line was carried out using the above-mentioned dictionary. The research of the recognition techniques for matching the character pattern without segmentation for each character in title line was developed
(7) Study on estimation for stroke order extracted from "Database of Kuzushiji Kaidoku dictionary" made by the dictionary has been carried out. Research reports including intermediate version for this study were published in March, 2001 and 2000 respectively besides papers regarding the historical document transliteration supporting system Less

Report

(4 results)
  • 2002 Annual Research Report   Final Research Report Summary
  • 2001 Annual Research Report
  • 2000 Annual Research Report
  • Research Products

    (20 results)

All Other

All Publications (20 results)

  • [Publications] 山田奨治, 柴山 守他: "類似文字検索機能をそなえた電子くずし字辞典の開発"情報処理学会研究報告2002-CH-54. Vol.2002, No.23. 43-50 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 山田奨治, 柴山 守他: "古文書を対象にした文字認識の研究"情報処理. Vol.43, No.9. 950-955 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 近藤博人, 松本隆, 柴山 守, 山田奨治, 荒木義彦: "文字切出しを前提としない古文書標題認識"情報処理学会研究報告2003-CH-57. Vol.2003, No.5. 1-8 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 安倍広多, 中塚麻記子, 柴山 守: "『くずし字解読辞典』文字画像からの筆順抽出の試み"大阪市立大学学術情報総合センター紀要. Vol.4. 19-23 (2003)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Kota Abe, Makiko Nakatsuka, and Mamoru Shibayama: "An Attempt to Extract Stroke Order from Handwritten Cursive Japanese Character Image"Bulletin of Osaka City University Media Center. 14. (2003)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Hirohito Kondo, Ryuichi Matsumoto, Mamoru Shiabayama, and Yoshihiko Araki: "Character Recognition without Segmentation for Title in Historical Document Images"IPSJ SIG-Report 2002. 57. 1-8 (2003)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Shoji Yamada and Mamoru Shibayama: "Studies on Chracter Recognition for Historical Document"Information Processing. 43 No.9. 950-955 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Shoji Yamada, Yuji Waizumi, Nei Kato, and Mamoru Shibayama: "Development of Digital Dictionary of Historical Characters with Search Function of Slimar Characters"IPSJ SIG-Report 2002. 54. 43-50 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Shoji Yamada, Nei Kato, Mamoru Shibayama, and et al.: "Historical Character Recognition (HCR) Project Report (2)"IPSJ SIG-Report 2001. 50. 9-16 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Koji OZAKI, Mamoru SHIBAYAMA, and Yoshihiko ARAKI: "Layout Recognition and Title Extraction for Historical Document Image"Proceedings of Symposium on Computer and the Humaniies, IPSJ. (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] Shoji YAMADA, Mamoru SHIBAYAMA: "A study of a historical document research supporting system using n-gram"IPSJ Symposium Series. 2000, No.17. 185-192 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2002 Final Research Report Summary
  • [Publications] 山田奨治, 柴山 守他: "類似文字検索機能をそなえた電子くずし字辞典の開発"情報処理学会研究報告2002-CH-54. Vol.2002,No.23. 43-50 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 山田奨治, 柴山 守: "古文書を対象にした文字認識の研究"情報処理. Vol.43 No.9. 950-955 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 近藤博人, 松本隆一, 柴山 守, 山田奨治, 荒木義彦: "文字切出しを前提としない古文書標題認識"情報処理学会研究報告2003-CH-57. Vol.2003,No.5. 1-8 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 安倍広多, 中塚麻記子, 柴山 守: "『くずし字解読辞典』文字画像からの筆順抽出の試み"大阪市立大学学術情報総合センター紀要. Vol.4. 19-23 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 山田奨治, 柴山 守他: "類似文字検索機能をそなえた電子くずし字辞典の開発"情報処理学会研究報告 2002-CH-54.

    • Related Report
      2001 Annual Research Report
  • [Publications] 尾崎浩司,柴山守 他: "古文書画像のレイアウト認識を標題抽出"情報処理学会研究報告. 2000・67. 47-54 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 山田奨治,柴山守: "n-gramによる古文書証文類翻刻支援の検討"人文科学とコンピュータシンポジウム2000論文集.

    • Related Report
      2000 Annual Research Report
  • [Publications] 尾崎浩司,柴山守 他: "古文書画像の標題文字セグメンテーション"人文科学をコンピュータシンポジウム2000論文集.

    • Related Report
      2000 Annual Research Report
  • [Publications] 柴山守: "証文類古文書標題の文字認識辞書構築とその利用について"京都大学大型計算機センター第67回研究セミナー. (2001)

    • Related Report
      2000 Annual Research Report

URL: 

Published: 2001-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi