• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of OCR (optical character recognition) system for scientific documents

Research Project

Project/Area Number 10558056
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section展開研究
Research Field 情報システム学(含情報図書館学)
Research InstitutionKYUSHU UNIVERSITY

Principal Investigator

SUZUKI Masakazu  Faculty of Mathematics, Kyushu University, Prof., 大学院・数理研究院, 教授 (20112302)

Co-Investigator(Kenkyū-buntansha) FUKUDA Ryoji  Oita University, Faculty of Engineering, Ass. Prof., 工学部, 助教授 (70238492)
EJIMA Toshiaki  Kyushu Inst. of Technology, Faculty of Computer Science and Systems Engineering, Prof., 情報工学部, 教授 (00124553)
TAMARI Fumikazu  Fukuoka Univ. of Education, Faculty of Education, Prof., 教育学部, 教授 (70036937)
YAMAGATA Hedeaki  Ricoh Co., Ltd., Software Research Center, ソフトウェア所・第2研究室, 研究員
TACHIKAWA Michiyoshi  Ricoh Co., Ltd., Software Research Center, ソフトウェア所・第2研究室, 研究室長
Project Period (FY) 1998 – 2001
Project Status Completed (Fiscal Year 2001)
Budget Amount *help
¥12,500,000 (Direct Cost: ¥12,500,000)
Fiscal Year 2001: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 2000: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 1999: ¥2,500,000 (Direct Cost: ¥2,500,000)
Fiscal Year 1998: ¥6,000,000 (Direct Cost: ¥6,000,000)
KeywordsOCR / Mathematical symbol recognition / Mathematical formual recognition / Document Analysis / Digital library / レイアウト解析 / 文書電子化 / 数学記号認識 / OCR / ヤグメンテーション / 手書き数式認識 / 光学文字認識
Research Abstract

In this research, we developed an OCR system adapted to scientific documents, in view of its application to retro-digitization of mathematical journals and automatic Braille transcription of mathematical documents. The target images are those obtained from clearly printed documents by 400-600DPI scanner.
Since there is no commercial OCR software which can recognize mathematical symbols, we developed our own OCR engine. It recognizes with about 450 kinds of characters and symbols used in mathematical expressions, and distinguishes well the italic fonts and upright fonts of alphabets.
For the recognition of text areas, there are several efficient post-processing methods to improve recognition results using linguistic information, while in mathematical expression areas, some other different methods of post-processing based on the structure of mathematical notations are efficient. Therefore, we developed algorithms to separate text area and mathematical expression areas, for both Japanese and English documents.
As for the structure analysis of mathematical expressions, we developed a new method, robust against the recognition errors of characters and similar characters of different sizes. We first construct a network joining characters (symbols) by possible links of relations with cost, Finally, we obtain the result of the recognition of mathematical formulas as the spanning tree of minimum cost of the network, after reevaluating the candidates by using the cost reflecting global structure of the mathematical expressions. The advantage of this method is that local errors of the recognition are recovered automatically by the total cost of the recognition tree.
We also developed handwriting interface to edit mathematical expressions to use it as an easy user interface to correct the recognition errors of mathematical expressions.

Report

(5 results)
  • 2001 Annual Research Report   Final Research Report Summary
  • 2000 Annual Research Report
  • 1999 Annual Research Report
  • 1998 Annual Research Report
  • Research Products

    (33 results)

All Other

All Publications (33 results)

  • [Publications] 能隅進一, 福田亮治, 玉利文和, 鈴木昌和: "絞り込み法による数式文字認識とその日本語/数式領域切りだしへの応用"電子情報通信学会論文誌. j83-DII, No.3. 895-906 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] T.Kanahori, K.Tabata, W.Cong, F.Tamari, M.Suzuki: "On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method"Advances in Multimodal Interfaces ICMI2000, Lecture Notes in Computer Science, Springer. 1948. 394-401 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Y.Eto, M.Suzuki: "Mathematical Formula Recognition Using Virtual Link Network"Proceedings of the 6th International Conference on Document Analysis and Recognition, Seattle, IEEE Computer Society Press. 430-437 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] T.Kanahori, M.Suzuki: "A Recognition Method of Matrices by Using Variable Block Pattern Elements Generating Rectangular Areas"Proceedings of the 4th IAPR International Workshop on Graphics Recognition. 455-469 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] 村上玄生, 鈴木昌和: "Center Bandを用いた数式構造解析の安定化"電子情報通信学会技術研究報告. PRMU2001-270. 203-210 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] 安藤英里子, 鈴木昌和: "文字画像の実時間クラスタリングを用いた文書認識と修正作業の効率化-英文数学書-"子情報通信学会技術研究報告. PRMU2001-271. 211-218 (2002)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] S. Nouzumi, R. Fukuda, F. Tamari, M. Suzuki: "Mathematical symbol recognition using filtering method and its application to the segmentation of Japanese area/Mathematical area"Transactions of IEICE. J83-DII No.3. 895-906 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] T. Kanahori, K. Tabata, W. Cong, F. Tamari, M. Suzuki: "On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method"Advances in Multimodal Interfaces-ICMI2000, Lecture Notes in Computer Science 1948, Springer. 394-401 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Y. Eto, M. Suzuki: "Mathematical Formula Recognition Using Virtual Link Network"Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, IEEE Computer Society Press. 430-437 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] T. Kanahori, M. Suzuki: "A Recognition Method of Matrices by Using. Variable Block Pattern Elements Generating Rectangular Areas"Proceedings of the Fourth IAPR International Workshop on Graphics Recognition. 455-469 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] M. Murakami, M. Suzuki: "Improvement of Mathematical structure analysis by using Center-Band"Technical Report of IEICE, PRMU2001-270. 203-210 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] E. Ando, M. Suzuki: "Document recognition by real-time classifications of character images and reduction of correction labor of recognition results"Technical Report of IEICE, PRMU2001-271. 211-218 (2002)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Y.Eto, M.Suzuki: "Mathematical Formula Recognition Using Virtual Link Network"Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, IEEE Computer Society Press. 430-437 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] T.Kanahori, M.Suzuki: "A Recognition Method of Matrices by Using Variable Block Pattern Elements Generating Rectangular Areas"Proceedings of the Fourth IAPR International Workshop on Graphics Recognition. 455-469 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 金堀利洋, 鈴木昌和: "可変ブロックパターンによる矩形領域分割を用いた行列の認識"信学技法. PRMU2000-201. 1-6 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 江藤裕子, 鈴木昌和: "仮想リンク・ネットワークを用いた数式認識"信学技法. PRMU2000-201. 7-14 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 田畑耕一, 福田亮治, 鈴木昌和: "2次元ワープを併用したオンライン英数字・数学記号認識"信学技法. PRMU2000-201. 23-30 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 中山優幸, 福田亮治, 鈴木昌和, 玉利文和: "数学記号の特徴を用いた数式の水平分割による数式構造解析"信学技法. PRMU2000-201. 15-22 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] T.Kanahori: "On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method"Advances in Multimodal Interfaces ICMI2000, Lecture Notes in Computer Science, Springer. 1948. 394-401 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] R.Fukuda: "Optical Recognition and Braille Transcription of Mathematical Documents"Proceedings of the 7th International Conference on Computers Helping People with Special Needs (ICCHP), Karlsruhe. 711-718 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 能隅進一: "絞り込み法による数式文字認識とその日本語/数式領域切りだしへの応用"電子情報通信学会論文誌. j83-DII,No.3. 895-906 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 金堀利洋: "可変ブロックパターンによる矩形領域分割を用いた行列の認識"電子情報通信学会技術研究報告. (掲載予定). (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 江藤裕子: "仮想リンク・ネットワークを用いた数式認識"電子情報通信学会技術研究報告. (掲載予定). (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 田畑耕一: "2次元ワープを併用したオンライン手書き英数字・数学記号認識"電子情報通信学会技術研究報告. (掲載予定). (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] R.Fukuda: "A Technique of Mathematical Expression Sturucture Analysis for the Handwriting Input System"Proceedings of 5th ICDAR,Bangalore. 131-134 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] H.Okamura: "Handwriting Interface for Computer Algevra Systems"Proceedings of 4th ATCM,Guangzhou. 291-300 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 江藤裕子: "最小コスト全域木探索を用いたオフライン数式構文認識"電子情報通信学会技術研究報告,PRMU. 99-236. 37-43 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] 能隅進一: "絞り込み方による数式文字認識とその日本語/数式領域切りだしへの応用"電子情報通信学会論文誌. j83-DII,No.3(掲載予定). (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] M.Sha.: "On-Line Recognition of Handwriting Mathematical Formulas via Networks" Proceedings of Third Asian Technology Conference in Mathematics,Springer. 271-279 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] K.Inoue: "Optical Recognition of Printes Mathmatical Documents" Proceedings of Third Asian Technology Conference in Mathematics,Springer. 280-289 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] S.Nouzumi: "Optical Recognition System of Printed Japanese Mathematical Documents." Proceedings of Third LAPR Workshop on Document Analysis. 197-200 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] 能隅進一: "高速性を重視した数学記号認識とその数式を含む日本語印刷文書認識への応用" 電子情報通信学会技術研究報告. 98・136. 1-8 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] 業偉: "ストロークの相互関係を用いたオンライン手書き数式認識" 電子情報通信学会技術研究報告. 98・136. 9-16 (1998)

    • Related Report
      1998 Annual Research Report

URL: 

Published: 1998-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi